Our next BeNeRL Reinforcement Learning Seminar (April 11) is coming:

Speaker: Minqi Jiang (https://minch.co), research scientist at Google DeepMind.

Title: Learning Curricula in Open-Ended Worlds

Date: April 11, 16.00-17.00 (CET)

Please find full details about the talk below this email and on the website of the seminar series: https://www.benerl.org/seminar-series

The goal of the online BeNeRL seminar series is to invite RL researchers (mostly advanced PhD or early postgraduate) to share their work. In addition, we invite the speakers to briefly share their experience with large-scale deep RL experiments, and their style/approach to get these to work.

We would be very glad if you forward this invitation within your group and to other colleagues that would be interested (also outside the BeNeRL region). Hope to see you on April 11!

Zhao Yang & Thomas Moerland

——————————————————————

Date: April 11, 16.00-17.00 (CET)

Speaker: Minqi Jiang (https://minch.co)

Title: Learning Curricula in Open-Ended Worlds

Abstract: Deep reinforcement learning (RL) agents commonly overfit to their training environments, performing poorly when the environment is even mildly perturbed. Such overfitting can be mitigated by conducting domain randomization (DR) over various aspects of the training environment in simulation. However, depending on implementation, DR makes potentially arbitrary assumptions about the distribution over environment instances. In larger environment design spaces, DR can become combinatorially less likely to sample specific environment instances that may be especially useful for learning. Unsupervised Environment Design (UED) improves upon these shortcomings by directly considering the problem of automatically generating a sequence or curriculum of environment instances presented to the agent for training, in order to maximize the agent's final robustness and generality. UED methods have been shown, in both theory and practice, to produce emergent training curricula that result in deep RL agents with improved transfer performance to out-of-distribution environment instances. Such autocurricula are promising paths toward open-ended learning systems that become increasingly capable by continually generating and mastering additional challenges of their own design. This talk provides a tour of recent algorithmic developments leading to successively more powerful UED methods, followed by a discussion of key challenges and potential paths to unlocking their full potential in practice.