Dear all,

Join us for our upcoming workshop SURE-AI@STAR Midsummer on Reinforcement Learning.

The event will take place on three different days in a hybrid format: participants in Oslo can attend the talk in Room 723 in Niels Henrik Abels hus, whereas the international audience will be able to follow the talk via Zoom (see link next to each talk).

The lecturer is Xin Guo (UC Berkeley).

Title: A control-theoretical perspective of continuous-time reinforcement learning

Abstract: Reinforcement Learning (RL) is a cornerstone of modern machine learning, enabling agents to learn optimal decision-making through interaction with complex environments and other agents. While RL was traditionally developed for discrete-time settings, many real-world physical and financial systems are intrinsically continuous. This series of lectures provides a comprehensive overview of recent advancements in continuous-time RL, analyzed through the rigorous lens of stochastic control theory. It consists of three parts.

Registration is free, but mandatory for both in-person and online attendees to ensure the best logistics. The seminar link will be shared with all registered participants before the event starts.

Part I. Foundations: Q-Functions and Dynamic Programming (23/06/2026, 13:00-16:00)

We begin by establishing a bridge between RL and classical control theory. By leveraging the Dynamic Programming Principle (DPP) and the Feyman-Kac formula, we explore the similarities between these fields, specifically through the lens of continuous-time Q-functions, and martingale characterization.

References

[1] Gu, Guo, Wei, Xu (2020). Dynamic programming principle for mean field controls with learning. Operations Research, 2023. Arxiv: 1911.07314.

[2] Gu, X. Guo, Wei, Xu (2021) Mean-Field controls with Q-learning for cooperative MARL: convergence and complexity analysis. SIAM Journal on Mathematics of Data Science. ArXiv:2002.04131.

[3] Cheng, Guo, Zhang (2025). Bridging discrete and continuous RL: stable deterministic policy gradient with martingale characterization. arXiv:2509.23711.

Part II. The Theoretical Divide: Regret Analysis and BSDEs (24/06/2026, 13:00-16:00)

To distinguish RL from classical control, we analyze ``the cost of learning'' through regret analysis. This section delves into the mathematical tools required to quantify performance gaps when the system model is unknown. We show how the regularity of Hamilton-Jacobi-Bellman (HJB) equations and Backward Stochastic Differential Equations (BSDEs) enables establishing regret bounds. In the particular case of Linear-Quadratic (LQ) Models: we present proofs for logarithmic regret in episodic finite-horizon RL via regularities of the Riccati equation.

References

[4] Basei, Guo, Hu, Zhang (2020). Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon. Journal of Machine Learning Research. Arxiv.org/abs/2006.15316.

[5] Guo, Hu, Zhang (2021). Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls. SIAM J. on Control Optimization (SICON). Arxiv.org/abs/2104.09311.

Part III. Modern Frontiers: Transfer Learning and Rough Paths (25/06/2026, 13:00-16:00)

Finally, we explore how cutting-edge developments from Large Language Models (LLMs), such as transfer learning, can be integrated into RL to significantly boost algorithmic performance. We discuss fast policy learning for LQ control using entropy techniques to balance exploration and exploitation. We will also uncover how the regularity of rough paths provides a framework for analyzing policy transfer in continuous-time RL, offering a robust mathematical foundation for cross-domain adaptation.

References

[6] Guo, Li, Xu (2025). Fast policy learning for linear quadratic control with entropy regularization. SICON. ArXiv:2311.14168.

[7] Cao, Gu., Guo, and Rosenbaum (2024). Transfer learning for portfolio optimization. arXiv:2307.13546.

[8] Guo. and Lyu (2025). Policy transfer for continuous-time reinforcement learning: a (rough) differential equation approach, arxiv2510.15165.

The seminar is in the interest of SURE-AI - Sustainable, Risk-Averse and Ethical AI.

We are looking forward to seeing you!

Best regards,

Giulia, Leonardo, Pere, and David