Adrián Bazaga's Homepage

Welcome to my website!

Background:
I’m a Research Scientist at Microsoft, pioneering the development of cutting-edge foundational Small Language Models (SLMs) that power intelligent, agentic AI experiences directly on user devices for millions of users worldwide. My work focuses on advancing the frontier of on-device intelligence, bringing fast and capable AI to life at global scale.

I’m a core contributor to Mu, Microsoft’s blazing-fast on-device SLM, where I played a central role developing the pretraining, mid-training, and post-training pipelines. I also co-led the development of the Windows Settings AI agent, already live on Windows Copilot+ devices. These efforts are part of my broader vision to enable seamless, deeply integrated AI experiences for everyone.

I hold a Ph.D. in Machine Learning from the University of Cambridge, where I conducted research under the supervision of Prof. Pietro Liò and Prof. Gos Micklem. My work has been published in leading Machine Learning conferences such as ICLR, ICML, ACL and EMNLP, as well as in Nature journals. Previously, I gained research experience through internships at Microsoft Research and Amazon AGI, where I explored novel training schemes to enhance few-step generation in diffusion models, and test-time scaling for temporal reasoning with Large Language Models (LLMs). Prior to that, I spent ~5 years in various startups, working at the intersection of AI and biology.

Research Interests:
My research focuses on advancing the capabilities of AI in the areas of foundational LLMs, reasoning and multimodality. I’m particularly interested in developing architectures that enhance the applicability of generative models and integrate diverse data modalities to tackle both core challenges and real-world problems 🧪. My current focus is on building small-scale foundational language models with agentic capabilities, expanding their multimodal capabilities, refining new training methodologies, improving inference efficiency, and ensuring robustness and alignment. I’m passionate about impactful, global-scale AI and open to collaborations that push its boundaries 👐.

Beyond Research:
In addition to my core research, I’m interested in how advancements in generative models can revolutionize education and governance. I’m deeply committed to conducting research with a significant impact and am eager to engage in discussions and collaborations that align with these goals.

News

Jun 23, 2025	We have launched Mu, our 0.3B ‘micro-size’ language model, built for blazing-fast on-device inference and already powering native agentic experiences on Windows devices. 🚀
Jun 1, 2025	[Paper] Our paper “Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models” has been accepted at ACL 2025 (Main) 🎉
Jan 3, 2025	I joined Microsoft as an AI Research Scientist in London (UK). Excited to work on delivering on-device LLM-based AI experiences for millions of users worldwide. ⭐
Sep 20, 2024	[Paper] Our paper “HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs” has been accepted at EMNLP 2024 🎉
Aug 15, 2024	I joined Amazon Science AGI team as a Research Scientist Intern to work on test-time scaling for temporal reasoning with LLMs alongside Bill Byrne, Rexhina Blloshmi and Adrià de Gispert, in Berlin (Germany). ⭐
Aug 13, 2024	I’m now part of the Reviewer Committee for the International Conference on Learning Representations (ICLR) and ACL conferences. 👍
Jun 16, 2024	Our paper “FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models” is now on arXiv. 📋
Jun 5, 2024	[Paper] Our paper “TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting” has been accepted at ICML 2024 🎉
Jun 1, 2024	I received a PhD Student Award by the Cambridge Society for the Application of Research in recognition of outstanding research with real world application, for my work on language-graph weakly supervised distillation for dense retrieval. 🏅
May 1, 2024	I joined Microsoft Research as a Research Scientist Intern to work on improved few-step generation for diffusion models with Javier Zazo, Richard Turner and Ted Meeds, in Cambridge (UK). ⭐

Selected Publications

ICLR

Unsupervised Pretraining for Fact Verification by Language Model Distillation

Adrián Bazaga, Pietro Liò, and Gos Micklem

In ICLR 2024 (International Conference on Learning Representations) 2024

arXiv Code
arXiv

SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation

Adrián Bazaga, Pietro Liò, and Gos Micklem

In arXiv:2310.18376 2023

arXiv Code
ICLR

Language Model Knowledge Distillation for Efficient Question Answering in Spanish

Adrián Bazaga, Pietro Liò, and Gos Micklem

In ICLR 2024 (International Conference on Learning Representations) 2023

arXiv Code
EMNLP

HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs

Adrián Bazaga, Pietro Liò, and Gos Micklem

In EMNLP 2024 (Empirical Methods in Natural Language Processing) 2024

arXiv Code
ICML

TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting

Andrei Margeloiu, Adrián Bazaga, Nikola Simidjievski, Pietro Liò, and Mateja Jamnik

In ICML 2024 (International Conference on Machine Learning) 2024

arXiv Code
arXiv

FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models

Max Zhu, Adrián Bazaga, and Pietro Liò

In arXiv:2406.04501 2024

arXiv Code
ACL

Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models

Adrián Bazaga, Rexhina Blloshmi, Bill Byrne, and Adrià Gispert

In ACL 2025 (Association for Computational Linguistics) 2025

arXiv Code