Dear colleagues,
Our next BeNeRL Reinforcement Learning Seminar (Sep 12) is coming:
Title: Sample Efficiency in Deep RL: Quo Vadis?
Date: September 12, 16.00-17.00 (CET)
The goal of the online BeNeRL seminar series is to invite RL researchers (mostly advanced PhD or early postgraduate) to share their work. In addition, we invite the speakers to briefly share their experience with large-scale deep RL experiments, and their style/approach
to get these to work.
We would be very glad if you forward this invitation within your group and to other colleagues that would be interested (also outside the BeNeRL region). Hope to see you on September 12!
Kind regards,
Zhao Yang & Thomas Moerland
Leiden University
覧覧覧覧覧覧覧覧覧覧覧
Upcoming talk:
Date: September 12, 16.00-17.00 (CET)
Title: Sample Efficiency in Deep RL: Quo Vadis?
Abstract: Deep reinforcement learning (RL) has shown remarkable successes but is often hindered by low sample efficiency and high computational costs. This talk presents two complementary studies that challenge conventional wisdom in deep RL. Both studies
offer a fresh perspective on accelerating RL algorithms and highlight some fundamental limitations. First, we explore the limits of value expansion methods in model-based RL, revealing surprising insights about the diminishing returns of longer rollout horizons
and increased model accuracy. Our findings suggest that pursuing perfect models may not be as crucial as previously thought. Second, we introduce CrossQ, a novel approach that dramatically improves sample efficiency in off-policy RL by leveraging batch normalization
and eliminating target networks. Contrary to other approaches, CrossQ does not increase the update-to-data ratio and, thus, achieves its state-of-the-art performance at just 5% of the computational cost of other current methods. We conclude by discussing implications
for future research directions, including applications in robotics and large-scale RL systems.
Bio: Daniel Palenicek is a PhD student at the Intelligent Autonomous System Group, TU Darmstadt, where Prof. Jan Peters advises him. He is also a part of the 3AI project with hessian.AI. Daniel痴 research lies at the intersection of reinforcement learning
and robotics. He is interested in increasing sample efficiency and scaling model-free and model-based reinforcement learning algorithms.
Before starting his Ph.D., Daniel completed his B.Sc. and M.Sc. in Wirtschaftsinformatik at TU Darmstadt. He wrote his Master's thesis entitled "Dyna-Style Model-Based Reinforcement Learning with Value Expansion" under the supervision of Dr. Michael Lutter
and Prof. Jan Peters. Prior, Daniel did two research internships. At the Bosch Center for AI he focused on model-free RL, and at Huawei Noah痴 Ark Lab in London, he worked on safe model-based RL and active exploration.