Dear all,
On Tuesday, December 3rd, at 14h00 in Aula Dal Passo of Tor Vergata Math Department, RoMaDS (https://www.mat.uniroma2.it/~rds/events.php) will host
Carlo Ciliberto (University College London) with the seminar
“Operator World Models for Reinforcement Learning”
Abstract: Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision making. However, it is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. Leveraging tools from operator theory we derive a closed-form expression of the action-value function in terms of the world model via simple matrix operations. Combining these estimators with PMD leads to POWR, a new RL algorithm for which we prove convergence rates to the global optimum. Preliminary experiments in finite and infinite state settings support the effectiveness of our method.
We encourage in-person partecipation. Should you be unable to come, here is the link to the Teams streaming:
https://teams.microsoft.com/l/meetup-join/19%3arfsL73KX-fw86y1YnXq2nk5VnZFwPU-iIPEmqet8NCg1%40thread.tacv2/1732627863559?context={"Tid"%3a"24c5be2a-d764-40c5-9975-82d08ae47d0e"%2c"Oid"%3a"650fc4a8-4cec-4bd2-87bc-90d134074fe6"}
The seminar is part of the Excellence Project MatMod@TOV.