The goal of the online BeNeRL seminar series is to invite RL researchers (mostly advanced PhD or early postgraduate) to share their work. In addition, we invite the speakers to briefly share their experience with large-scale deep RL experiments, and their style/approach to get these to work.

Abstract: Deep reinforcement learning (RL) has shown remarkable successes but is often hindered by low sample efficiency and high computational costs. This talk presents two complementary studies that challenge conventional wisdom in deep RL. Both studies offer a fresh perspective on accelerating RL algorithms and highlight some fundamental limitations. First, we explore the limits of value expansion methods in model-based RL, revealing surprising insights about the diminishing returns of longer rollout horizons and increased model accuracy. Our findings suggest that pursuing perfect models may not be as crucial as previously thought. Second, we introduce CrossQ, a novel approach that dramatically improves sample efficiency in off-policy RL by leveraging batch normalization and eliminating target networks. Contrary to other approaches, CrossQ does not increase the update-to-data ratio and, thus, achieves its state-of-the-art performance at just 5% of the computational cost of other current methods. We conclude by discussing implications for future research directions, including applications in robotics and large-scale RL systems.

Bio: Daniel Palenicek is a PhD student at the Intelligent Autonomous System Group, TU Darmstadt, where Prof. Jan Peters advises him. He is also a part of the 3AI project with hessian.AI. Daniel’s research lies at the intersection of reinforcement learning and robotics. He is interested in increasing sample efficiency and scaling model-free and model-based reinforcement learning algorithms.

Before starting his Ph.D., Daniel completed his B.Sc. and M.Sc. in Wirtschaftsinformatik at TU Darmstadt. He wrote his Master's thesis entitled "Dyna-Style Model-Based Reinforcement Learning with Value Expansion" under the supervision of Dr. Michael Lutter and Prof. Jan Peters. Prior, Daniel did two research internships. At the Bosch Center for AI he focused on model-free RL, and at Huawei Noah’s Ark Lab in London, he worked on safe model-based RL and active exploration.