Adrián Bazaga's Homepage

Senior Research Scientist @ Microsoft

Welcome to my website!

Background:
I’m a Senior Research Scientist at Microsoft. My current focus is on end-to-end frontier Small Language Model (SLM) development and research, covering the entire stack of data preparation → pretraining → mid-training → post-training → evaluation, as well as shipping agentic experiences directly on user devices for millions of users at global scale.

I’m a core contributor to Mu, Microsoft’s blazing-fast on-device SLM, where I played a central role developing the pretraining and post-training pipelines. I also co-led the development of the Windows Settings AI agent, already live on Windows Copilot+ devices. These efforts are part of my broader vision to enable seamless, deeply integrated AI experiences for everyone.

I hold a Ph.D. in Machine Learning from the University of Cambridge, where I conducted research under the supervision of Prof. Pietro Liò and Prof. Gos Micklem. My work has been published in leading Machine Learning conferences such as ICLR, ICML, ACL and EMNLP, as well as in Nature journals. Previously, I gained research experience through internships at Microsoft Research and Amazon AGI, where I explored novel training schemes to enhance few-step generation in diffusion models, and test-time scaling for temporal reasoning with Large Language Models (LLMs). Prior to that, I spent ~5 years in various startups, working at the intersection of AI and biology.

Research Interests:
My research focuses on advancing the capabilities of AI in the areas of foundational LLMs, reasoning, tool usage, and multimodality. Currently, I’m broadly interested on devising efficient SLM architectures, inventing data-efficient optimization techniques, expanding tool usage capabilities with refined post-training methodologies, and ensuring model robustness and alignment.

Beyond Research:
In addition to my core research, I’d like to explore how generative models can improve education and governance. If you’re working on high-impact, real-world deployments in these areas, I’m always open to collaborate 👐.

News

Sep 1, 2025 I have been promoted to Senior Research Scientist at Microsoft, now co-leading a group developing state-of-the-art Small Language Models (SLMs), from pre-training to post-training and all the way to on-device deployment. ⭐
Jun 23, 2025 We have launched Mu, our 0.3B ‘micro-size’ language model, built for blazing-fast on-device inference and already powering native agentic experiences on Windows devices. 🚀
Jun 1, 2025 [Paper] Our paper “Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models” has been accepted at ACL 2025 (Main) 🎉
Jan 3, 2025 I joined Microsoft as an AI Research Scientist in London (UK). Excited to work on delivering on-device LLM-based AI experiences for millions of users worldwide. ⭐
Sep 20, 2024 [Paper] Our paper “HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs” has been accepted at EMNLP 2024 🎉
Aug 15, 2024 I joined Amazon Science AGI team as a Research Scientist Intern to work on test-time scaling for temporal reasoning with LLMs alongside Bill Byrne, Rexhina Blloshmi and Adrià de Gispert, in Berlin (Germany). ⭐
Aug 13, 2024 I’m now part of the Reviewer Committee for the International Conference on Learning Representations (ICLR) and ACL conferences. 👍
Jun 16, 2024 Our paper “FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models” is now on arXiv. 📋
Jun 5, 2024 [Paper] Our paper “TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting” has been accepted at ICML 2024 🎉
Jun 1, 2024 I received a PhD Student Award by the Cambridge Society for the Application of Research in recognition of outstanding research with real world application, for my work on language-graph weakly supervised distillation for dense retrieval. 🏅

Selected Publications

  1. ICLR
    Unsupervised Pretraining for Fact Verification by Language Model Distillation
    Adrián BazagaPietro Liò, and Gos Micklem
    In ICLR 2024 (International Conference on Learning Representations) 2024
  2. arXiv
    SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation
    Adrián BazagaPietro Liò, and Gos Micklem
    In arXiv:2310.18376 2023
  3. ICLR
    Language Model Knowledge Distillation for Efficient Question Answering in Spanish
    Adrián BazagaPietro Liò, and Gos Micklem
    In ICLR 2024 (International Conference on Learning Representations) 2023
  4. EMNLP
    HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs
    Adrián BazagaPietro Liò, and Gos Micklem
    In EMNLP 2024 (Empirical Methods in Natural Language Processing) 2024
  5. ICML
    TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting
    Andrei Margeloiu,  Adrián Bazaga, Nikola Simidjievski, Pietro Liò, and Mateja Jamnik
    In ICML 2024 (International Conference on Machine Learning) 2024
  6. arXiv
    FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models
    Max Zhu,  Adrián Bazaga, and Pietro Liò
    In arXiv:2406.04501 2024
  7. ACL
    Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models
    Adrián Bazaga, Rexhina Blloshmi, Bill Byrne, and Adrià Gispert
    In ACL 2025 (Association for Computational Linguistics) 2025