Adrián Bazaga's Homepage

My research spans a wide range of topics around generative AI (particularly with Large Language Models and Diffusion Models), multimodality and applied research.

I am interested in (1) leveraging large language models (LLMs) and other foundational models to tackle both fundamental challenges and practical, real-world problems, (2) exploring modality alignment and the seamless integration of diverse data modalities, (3) developing more robust training methodologies, (4) improving inference efficiency, and (5) model robustness / alignment. My work is dedicated to creating innovative solutions that not only enhance the performance of AI systems but also contribute meaningfully to impactful projects at global scale.

Publications

Below is a list of my publications in reversed chronological order.

2025

ACL

Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models

Adrián Bazaga, Rexhina Blloshmi, Bill Byrne, and Adrià Gispert

In ACL 2025 (Association for Computational Linguistics) 2025

arXiv Code

2024

ICLR

Unsupervised Pretraining for Fact Verification by Language Model Distillation

Adrián Bazaga, Pietro Liò, and Gos Micklem

In ICLR 2024 (International Conference on Learning Representations) 2024

arXiv Code
EMNLP

HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Adrián Bazaga, Pietro Liò, and Gos Micklem

In EMNLP 2024 (Empirical Methods in Natural Language Processing) 2024

arXiv Code
ICML

TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting

Andrei Margeloiu, Adrián Bazaga, Nikola Simidjievski, Pietro Liò, and Mateja Jamnik

In ICML 2024 (International Conference on Machine Learning) 2024

arXiv Code
arXiv

FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models

Max Zhu, Adrián Bazaga, and Pietro Liò

In arXiv:2406.04501 2024

arXiv Code

2023

arXiv

SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation

Adrián Bazaga, Pietro Liò, and Gos Micklem

In arXiv:2310.18376 2023

arXiv Code
ICLR

Language Model Knowledge Distillation for Efficient Question Answering in Spanish

Adrián Bazaga, Pietro Liò, and Gos Micklem

In ICLR 2024 (International Conference on Learning Representations) 2023

arXiv Code
Annals of Oncology

Multi-site validation of a deep learning solution for HER2 profiling of breast cancer from H&E-stained pathology slides

S Arslan, P Pandya, F Ntelemis, S Wolf, J Schmidt, A Geraldes, A Bazaga, D Mehrotra, J Nyonyintono, S Singhal, and others

Annals of Oncology 2023

2022

Database

HumanMine: advanced data searching, analysis and cross-species comparison

Rachel Lyne, Adrián Bazaga, Daniela Butano, Sergio Contrino, Joshua Heimbach, Fengyuan Hu, Alexis Kalderimis, Mike Lyne, Kevin Reierskog, Radek Stepan, and others

Database 2022

2021

Nature Scientific Reports

Translating synthetic natural language to database queries with a polyglot deep learning framework

Adrián Bazaga, Nupur Gunwant, and Gos Micklem

Nature Scientific Reports 2021

arXiv Code

2020

Nature Scientific Reports

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology

Adrián Bazaga, Dan Leggate, and Hendrik Weisser

Nature Scientific Reports 2020

2019

Applied Soft Computing

A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies

Adrián Bazaga, Mònica Roldán, Carmen Badosa, Cecilia Jiménez-Mallebrera, and Josep M. Porta

Applied Soft Computing 2019

arXiv Code
WSOM

Network Community Cluster-Based Analysis for the Identification of Potential Leukemia Drug Targets

Adrián Bazaga, and Alfredo Vellido

In International Workshop on Self-Organizing Maps 2019
Neuromuscular Disorders

Automated diagnosis of collagen VI related muscular dystrophies using advanced image analysis and machine learning

Mónica Roldán, Adrián Bazaga, Carmen Badosa, Josep M. Porta, and Cecilia Jimenez-Mallebrera

In Neuromuscular Disorders 2019

2018

arXiv

Performance Evaluation of an Algorithm-based Asynchronous Checkpoint-Restart Fault Tolerant Application Using Mixed MPI/GPI-2

Adrián Bazaga, and Michal Pitonak

arXiv preprint 2018
Bioinformatics

BIOLITMAP: a web-based geolocated, temporal and thematic visualization of the evolution of bioinformatics publications

Adrián Bazaga, Alfonso Valencia, and MJ Rementeria-Núñez

Bioinformatics 2018