Adrián Bazaga's Homepage

Welcome to my website!

Background:
I’m a Senior Research Scientist at Microsoft. My current focus is on end-to-end frontier Small Language Model (SLM) development and research, covering the entire stack of data preparation → pretraining → mid-training → post-training → evaluation, as well as shipping agentic experiences directly on user devices for millions of users at global scale.

I’m a core contributor to Mu, Microsoft’s blazing-fast on-device SLM, where I played a central role developing the pretraining and post-training pipelines. I also co-led the development of the Windows Settings AI agent, already live on Windows Copilot+ devices. These efforts are part of my broader vision to enable seamless, deeply integrated AI experiences for everyone.

I hold a Ph.D. in Machine Learning from the University of Cambridge, where I conducted research under the supervision of Prof. Pietro Liò and Prof. Gos Micklem. My work has been published in leading Machine Learning conferences such as ICLR, ICML, ACL and EMNLP, as well as in Nature journals. Previously, I gained research experience through internships at Microsoft Research and Amazon AGI, where I explored novel training schemes to enhance few-step generation in diffusion models, and test-time scaling for temporal reasoning with Large Language Models (LLMs). Prior to that, I spent ~5 years in various startups, working at the intersection of AI and biology.

Research Interests:
My research focuses on advancing the capabilities of AI in the areas of foundational LLMs, reasoning, tool usage, and multimodality. Currently, I’m broadly interested on devising efficient SLM architectures, inventing data-efficient optimization techniques, expanding tool usage capabilities with refined post-training methodologies, and ensuring model robustness and alignment.

Beyond Research:
In addition to my core research, I’d like to explore how generative models can improve education and governance. If you’re working on high-impact, real-world deployments in these areas, I’m always open to collaborate 👐.

News

Sep 1, 2025	I have been promoted to Senior Research Scientist at Microsoft, now co-leading a group developing state-of-the-art Small Language Models (SLMs), from pre-training to post-training and all the way to on-device deployment. ⭐
Jun 23, 2025	We have launched Mu, our 0.3B ‘micro-size’ language model, built for blazing-fast on-device inference and already powering native agentic experiences on Windows devices. 🚀
Jun 1, 2025	[Paper] Our paper “Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models” has been accepted at ACL 2025 (Main) 🎉
Jan 3, 2025	I joined Microsoft as an AI Research Scientist in London (UK). Excited to work on delivering on-device LLM-based AI experiences for millions of users worldwide. ⭐
Sep 20, 2024	[Paper] Our paper “HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs” has been accepted at EMNLP 2024 🎉
Aug 15, 2024	I joined Amazon Science AGI team as a Research Scientist Intern to work on test-time scaling for temporal reasoning with LLMs alongside Bill Byrne, Rexhina Blloshmi and Adrià de Gispert, in Berlin (Germany). ⭐
Aug 13, 2024	I’m now part of the Reviewer Committee for the International Conference on Learning Representations (ICLR) and ACL conferences. 👍
Jun 16, 2024	Our paper “FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models” is now on arXiv. 📋
Jun 5, 2024	[Paper] Our paper “TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting” has been accepted at ICML 2024 🎉
Jun 1, 2024	I received a PhD Student Award by the Cambridge Society for the Application of Research in recognition of outstanding research with real world application, for my work on language-graph weakly supervised distillation for dense retrieval. 🏅

Selected Publications

ICLR

Unsupervised Pretraining for Fact Verification by Language Model Distillation

Adrián Bazaga, Pietro Liò, and Gos Micklem

In ICLR 2024 (International Conference on Learning Representations) 2024

arXiv Code
arXiv

SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation

Adrián Bazaga, Pietro Liò, and Gos Micklem

In arXiv:2310.18376 2023

arXiv Code
ICLR

Language Model Knowledge Distillation for Efficient Question Answering in Spanish

Adrián Bazaga, Pietro Liò, and Gos Micklem

In ICLR 2024 (International Conference on Learning Representations) 2023

arXiv Code
EMNLP

HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Adrián Bazaga, Pietro Liò, and Gos Micklem

In EMNLP 2024 (Empirical Methods in Natural Language Processing) 2024

arXiv Code
ICML

TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting

Andrei Margeloiu, Adrián Bazaga, Nikola Simidjievski, Pietro Liò, and Mateja Jamnik

In ICML 2024 (International Conference on Machine Learning) 2024

arXiv Code
arXiv

FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models

Max Zhu, Adrián Bazaga, and Pietro Liò

In arXiv:2406.04501 2024

arXiv Code
ACL

Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models

Adrián Bazaga, Rexhina Blloshmi, Bill Byrne, and Adrià Gispert

In ACL 2025 (Association for Computational Linguistics) 2025

arXiv Code