Dear all,
On Thursday august 31 we have Chloé Rouyer in the Statistics and Machine Learning Thematic Seminar.
Chloé Rouyer (University of Kopenhagen, https://sites.google.com/view/chloerouyer/home)
Thursday august 31, 16:00-17:00 In person, at the University of Amsterdam Location: TBA
Title: No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions Abstract: We consider a simple reinforcement learning framework known as episodic Markov Decision Processes, which is a repeated game where a learner has to navigate an environment and learn the underlying transitions and reward functions with the goal of maximizing its cumulative reward. We investigate the effect of corruption on both the transition and the reward functions. Specifically, it has been shown that corrupting the transitions is more detrimental for the learner than corrupting the rewards as a linear amount of corruption on the transitions can lead to linear regret, whereas it is still possible to achieve sublinear regret even with a linear amount of corruption on the rewards (also known as the adversarial regime). In this work, our goal is to adapt to detangle both notions of corruptions and to propose methods that optimally adapt to each corruption measure.
Seminar organizers: Tim van Erven Botond Szabo
machine-learning-nederland@list.uva.nl