Research

My research spans a wide range of topics around generative AI (particularly with Large Language Models and Diffusion Models), multimodality and applied research.

I am interested in (1) leveraging large language models (LLMs) and other foundational models to tackle both fundamental challenges and practical, real-world problems, (2) exploring modality alignment and the seamless integration of diverse data modalities, (3) developing more robust training methodologies, (4) improving inference efficiency, and (5) model robustness / alignment. My work is dedicated to creating innovative solutions that not only enhance the performance of AI systems but also contribute meaningfully to impactful projects at global scale.

Publications

Below is a list of my publications in reversed chronological order.


2025

    2024

    1. ICLR
      Unsupervised Pretraining for Fact Verification by Language Model Distillation
      Adrián BazagaPietro Liò, and Gos Micklem
      In ICLR 2024 (International Conference on Learning Representations) 2024
    2. EMNLP
      HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs
      Adrián BazagaPietro Liò, and Gos Micklem
      In EMNLP 2024 (Empirical Methods in Natural Language Processing) 2024
    3. ICML
      TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting
      Andrei Margeloiu,  Adrián Bazaga, Nikola Simidjievski, Pietro Liò, and Mateja Jamnik
      In ICML 2024 (International Conference on Machine Learning) 2024
    4. arXiv
      FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models
      Max Zhu,  Adrián Bazaga, and Pietro Liò
      In arXiv:2406.04501 2024

    2023

    1. arXiv
      SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation
      Adrián BazagaPietro Liò, and Gos Micklem
      In arXiv:2310.18376 2023
    2. ICLR
      Language Model Knowledge Distillation for Efficient Question Answering in Spanish
      Adrián BazagaPietro Liò, and Gos Micklem
      In ICLR 2024 (International Conference on Learning Representations) 2023
    3. Annals of Oncology
      Multi-site validation of a deep learning solution for HER2 profiling of breast cancer from H&E-stained pathology slides
      S Arslan, P Pandya, F Ntelemis, S Wolf, J Schmidt, A Geraldes,  A Bazaga, D Mehrotra, J Nyonyintono, S Singhal, and others
      Annals of Oncology 2023

    2022

    1. Database
      HumanMine: advanced data searching, analysis and cross-species comparison
      Rachel Lyne,  Adrián Bazaga, Daniela Butano, Sergio Contrino, Joshua Heimbach, Fengyuan Hu, Alexis Kalderimis, Mike Lyne, Kevin Reierskog, Radek Stepan, and others
      Database 2022

    2021

    1. Nature Scientific Reports
      Translating synthetic natural language to database queries with a polyglot deep learning framework
      Adrián Bazaga, Nupur Gunwant, and Gos Micklem
      Nature Scientific Reports 2021

    2020

    1. Nature Scientific Reports
      Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology
      Adrián Bazaga, Dan Leggate, and Hendrik Weisser
      Nature Scientific Reports 2020

    2019

    1. Applied Soft Computing
      A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies
      Adrián Bazaga, Mònica Roldán, Carmen Badosa, Cecilia Jiménez-Mallebrera, and Josep M. Porta
      Applied Soft Computing 2019
    2. WSOM
      Network Community Cluster-Based Analysis for the Identification of Potential Leukemia Drug Targets
      Adrián Bazaga, and Alfredo Vellido
      In International Workshop on Self-Organizing Maps 2019
    3. Neuromuscular Disorders
      Automated diagnosis of collagen VI related muscular dystrophies using advanced image analysis and machine learning
      Mónica Roldán,  Adrián Bazaga, Carmen Badosa, Josep M. Porta, and Cecilia Jimenez-Mallebrera
      In Neuromuscular Disorders 2019

    2018

    1. arXiv
      Performance Evaluation of an Algorithm-based Asynchronous Checkpoint-Restart Fault Tolerant Application Using Mixed MPI/GPI-2
      Adrián Bazaga, and Michal Pitonak
      arXiv preprint 2018
    2. Bioinformatics
      BIOLITMAP: a web-based geolocated, temporal and thematic visualization of the evolution of bioinformatics publications
      Adrián Bazaga, Alfonso Valencia, and MJ Rementeria-Núñez
      Bioinformatics 2018