Research

My research spans a wide range of topics around generative AI (particularly with Large Language Models and Diffusion Models), multimodality and applied research.

I am interested in (1) leveraging large language models (LLMs) and other foundational models to tackle both fundamental challenges and practical, real-world problems, (2) exploring modality alignment and the seamless integration of diverse data modalities, (3) developing more robust training methodologies, (4) improving model inference efficiency, and (5) model alignment. My work is dedicated to creating innovative solutions that not only enhance the performance of AI systems but also contribute meaningfully to scientific discovery.

Publications

Below is a list of my publications in reversed chronological order.


2024

  1. ICLR
    Unsupervised Pretraining for Fact Verification by Language Model Distillation
    Adrián BazagaPietro Liò, and Gos Micklem
    In ICLR 2024 (International Conference on Learning Representations) 2024
  2. EMNLP
    HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs
    Adrián BazagaPietro Liò, and Gos Micklem
    In EMNLP 2024 (Empirical Methods in Natural Language Processing) 2024
  3. ICML
    TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting
    Andrei Margeloiu,  Adrián Bazaga, Nikola Simidjievski, Pietro Liò, and Mateja Jamnik
    In ICML 2024 (International Conference on Machine Learning) 2024
  4. arXiv
    FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models
    Max Zhu,  Adrián Bazaga, and Pietro Liò
    In arXiv:2406.04501 2024

2023

  1. arXiv
    SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation
    Adrián BazagaPietro Liò, and Gos Micklem
    In arXiv:2310.18376 2023
  2. ICLR
    Language Model Knowledge Distillation for Efficient Question Answering in Spanish
    Adrián BazagaPietro Liò, and Gos Micklem
    In ICLR 2024 (International Conference on Learning Representations) 2023
  3. Annals of Oncology
    Multi-site validation of a deep learning solution for HER2 profiling of breast cancer from H&E-stained pathology slides
    S Arslan, P Pandya, F Ntelemis, S Wolf, J Schmidt, A Geraldes,  A Bazaga, D Mehrotra, J Nyonyintono, S Singhal, and others
    Annals of Oncology 2023

2022

  1. Database
    HumanMine: advanced data searching, analysis and cross-species comparison
    Rachel Lyne,  Adrián Bazaga, Daniela Butano, Sergio Contrino, Joshua Heimbach, Fengyuan Hu, Alexis Kalderimis, Mike Lyne, Kevin Reierskog, Radek Stepan, and others
    Database 2022

2021

  1. Nature Scientific Reports
    Translating synthetic natural language to database queries with a polyglot deep learning framework
    Adrián Bazaga, Nupur Gunwant, and Gos Micklem
    Nature Scientific Reports 2021

2020

  1. Nature Scientific Reports
    Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology
    Adrián Bazaga, Dan Leggate, and Hendrik Weisser
    Nature Scientific Reports 2020

2019

  1. Applied Soft Computing
    A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies
    Adrián Bazaga, Mònica Roldán, Carmen Badosa, Cecilia Jiménez-Mallebrera, and Josep M. Porta
    Applied Soft Computing 2019
  2. WSOM
    Network Community Cluster-Based Analysis for the Identification of Potential Leukemia Drug Targets
    Adrián Bazaga, and Alfredo Vellido
    In International Workshop on Self-Organizing Maps 2019
  3. Neuromuscular Disorders
    Automated diagnosis of collagen VI related muscular dystrophies using advanced image analysis and machine learning
    Mónica Roldán,  Adrián Bazaga, Carmen Badosa, Josep M. Porta, and Cecilia Jimenez-Mallebrera
    In Neuromuscular Disorders 2019

2018

  1. arXiv
    Performance Evaluation of an Algorithm-based Asynchronous Checkpoint-Restart Fault Tolerant Application Using Mixed MPI/GPI-2
    Adrián Bazaga, and Michal Pitonak
    arXiv preprint 2018
  2. Bioinformatics
    BIOLITMAP: a web-based geolocated, temporal and thematic visualization of the evolution of bioinformatics publications
    Adrián Bazaga, Alfonso Valencia, and MJ Rementeria-Núñez
    Bioinformatics 2018