Dear all,
This Friday June 10 we have Julia Olkhovskaya from the VU speaking in the thematic seminar.
*Julia Olkhovskaya *(Vrije Universiteit, https://sites.google.com/view/julia-olkhovskaya/home)
*Friday June 10*, 16h00-17h00 Online on Zoom: https://uva-live.zoom.us/j/89796690874 Meeting ID: 897 9669 0874
*Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits*
We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of Russo and Van Roy [2016] to the contextual setting by introducing a new concept of information ratio based on the mutual information between the unknown model parameter and the observed loss. This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof, and with no structural assumptions on the likelihood or the prior. We also extend our results to priors with infinite entropy under a Lipschitz assumption on the log-likelihood. An interesting special case is that of logistic bandits with d-dimensional parameters, K actions, and Lipschitz logits.
This is joint work with Gergely Neu, Matteo Papini and Ludovic Schwartz.
Seminar organizers: Tim van Erven Botond Szabo
https://mschauer.github.io/StructuresSeminar/
machine-learning-nederland@list.uva.nl