Antoine Moulin
Hi! I am a PhD student in reinforcement learning (RL). My thesis aims to gain a deeper understanding of the challenges posed by large-scale RL by identifying and exploiting structural properties of Markov decision processes (MDPs) that make learning statistically and computationally feasible.
My research interests also include: online learning, imitation learning, and language modeling.
I am co-advised by Gergely Neu and Arthur Gretton. You can reach out to me at: firstname [dot] lastname [at] upf [dot] edu.
CV / Google Scholar / X / GitHub
[★] indicates my favorite papers.
A benchmark of expert-level academic questions to assess AI capabilities (Humanity's Last Exam)
Center for AI Safety, Scale AI, HLE Contributors Consortium
Nature
arxiv
tl;dr: multi-modal benchmark with multi-choice/short answer questions on various topics. little contribution from me.
Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression
Dimitri Meunier, Jakub Wornbard, Vladimir R Kostic, Antoine Moulin, Alek Frölich, Karim Lounici, Massimiliano Pontil, Arthur Gretton
preprint
arxiv
tl;dr: since standard spectral features are agnostic to the outcome, they fail under misalignment; we remedy this by regularizing them towards the outcome via an augmented operator and a contrastive loss.
★ Inverse Q-Learning Done Right: Offline Imitation Learning in Qπ-Realizable MDPs
Antoine Moulin, Gergely Neu, Luca Viano
(NeurIPS 2025) 39th Annual Conference on Neural Information Processing Systems
arxiv
tl;dr: we propose a primal-dual method able to provably match the return of the expert in linear and general Qπ-realizable MDPs, providing an alternative to maximum likelihood estimation (also known as behavior cloning, or next-token prediction) which can fail dramatically when expert realizability does not hold.
Demystifying Spectral Feature Learning for Instrumental Variable Regression
Dimitri Meunier, Antoine Moulin, Jakub Wornbard, Vladimir R. Kostic, Arthur Gretton
(NeurIPS 2025) 39th Annual Conference on Neural Information Processing Systems
arxiv
tl;dr: we focus on understanding why and when spectral methods work for IV regression, and show it depends on the alignment of the target function with the top singular directions of a conditional expectation operator (dubbed "spectral alignment"), and the singular value decay (which captures instrument strength).
When Lower-Order Terms Dominate: Improved Loss-Range Adaptivity for Experts Algorithms
Antoine Moulin, Emmanuel Esposito, Dirk van der Hoeven
(NeurIPS 2025) 39th Annual Conference on Neural Information Processing Systems
arxiv
tl;dr: we develop algorithms for the experts problem with possibly heavy-tailed losses that are adaptive to the second moment, and show they achieve best-of-both worlds guarantees.
Antoine Moulin, Gergely Neu, Luca Viano
(COLT 2025) 38th Annual Conference on Learning Theory
Contributed talk at EWRL 2025
arxiv
tl;dr: we provide the first computationally efficient algorithm achieving rate-optimal regret in discounted, linear MDPs; it combines optimistic exploration and artificial transitions to an absorbing state with maximal return. we also apply it to interactive imitation learning, where we get state-of-the-art guarantees.
Spectral Representation for Causal Estimation with Hidden Confounders
Haotian Sun, Antoine Moulin, Tongzheng Ren, Arthur Gretton, Bo Dai
(AISTATS 2025) 28th International Conference on Artificial Intelligence and Statistics
arxiv
tl;dr: under a low-rank assumption on suitable conditional distributions (analogous to the low-rank MDP assumption made in RL), we propose a primal-dual method that performs well on IV and PCL problems.
Optimistic Planning by Regularized Dynamic Programming
Antoine Moulin, Gergely Neu
(ICML 2023) 40th International Conference on Machine Learning
arxiv
tl;dr: we analyze optimistic value iteration in discounted MDPs and show that regularization can be used to avoid contraction and monotonicity arguments which typically do not hold under function approximation.
Tutorial on Imitation Learning
Efficient Exploration in Linear Markov Decision Processes
Inverse Q-Learning for Offline Imitation Learning
Learning in Adversarial Linear MDPs
Optimistic Planning by Regularized Dynamic Programming
Infinite Horizon MDPs under Function Approximation
Primal-Dual Methods for Reinforcement Learning
Introduction to JAX
Virtual Sculpture