Dear colleagues,
 
Our next BeNeRL Reinforcement Learning Seminar (Sep 12) is coming: 
Speaker: Daniel Palenicek (https://www.ias.informatik.tu-darmstadt.de/Team/DanielPalenicek), PhD student from TU Darmstadt.

Title: Sample Efficiency in Deep RL: Quo Vadis?
Date: September 12, 16.00-17.00 (CET)
Please find full details about the talk below this email and on the website of the seminar series:https://www.benerl.org/seminar-series
 
The goal of the online BeNeRL seminar series is to invite RL researchers (mostly advanced PhD or early postgraduate) to share their work. In addition, we invite the speakers to briefly share their experience with large-scale deep RL experiments, and their style/approach to get these to work.
 
We would be very glad if you forward this invitation within your group and to other colleagues that would be interested (also outside the BeNeRL region). Hope to see you on September 12!
 
Kind regards,
Zhao Yang & Thomas Moerland
Leiden University
 
覧覧覧覧覧覧覧覧覧覧覧
 
 
Upcoming talk: 
 
Date: September 12, 16.00-17.00 (CET)
Speaker: Daniel Palenicek (https://www.ias.informatik.tu-darmstadt.de/Team/DanielPalenicek)
Title: Sample Efficiency in Deep RL: Quo Vadis?
Zoom: https://universiteitleiden.zoom.us/j/65411016557?pwd=MzlqcVhzVzUyZlJKTEE0Nk5uQkpEUT09
Abstract: Deep reinforcement learning (RL) has shown remarkable successes but is often hindered by low sample efficiency and high computational costs. This talk presents two complementary studies that challenge conventional wisdom in deep RL. Both studies offer a fresh perspective on accelerating RL algorithms and highlight some fundamental limitations. First, we explore the limits of value expansion methods in model-based RL, revealing surprising insights about the diminishing returns of longer rollout horizons and increased model accuracy. Our findings suggest that pursuing perfect models may not be as crucial as previously thought. Second, we introduce CrossQ, a novel approach that dramatically improves sample efficiency in off-policy RL by leveraging batch normalization and eliminating target networks. Contrary to other approaches, CrossQ does not increase the update-to-data ratio and, thus, achieves its state-of-the-art performance at just 5% of the computational cost of other current methods. We conclude by discussing implications for future research directions, including applications in robotics and large-scale RL systems.
Bio: Daniel Palenicek is a PhD student at the Intelligent Autonomous System Group, TU Darmstadt, where Prof. Jan Peters advises him. He is also a part of the 3AI project with hessian.AI. Daniel痴 research lies at the intersection of reinforcement learning and robotics. He is interested in increasing sample efficiency and scaling model-free and model-based reinforcement learning algorithms. 
Before starting his Ph.D., Daniel completed his B.Sc. and M.Sc. in Wirtschaftsinformatik at TU Darmstadt. He wrote his Master's thesis entitled "Dyna-Style Model-Based Reinforcement Learning with Value Expansion" under the supervision of Dr. Michael Lutter and Prof. Jan Peters. Prior, Daniel did two research internships. At the Bosch Center for AI he focused on model-free RL, and at Huawei Noah痴 Ark Lab in London, he worked on safe model-based RL and active exploration.