Dear all,
This Friday June 10 we have Julia Olkhovskaya from the VU speaking
in the thematic seminar.
Julia Olkhovskaya (Vrije Universiteit, https://sites.google.com/view/julia-olkhovskaya/home)
Friday June 10, 16h00-17h00
Online on Zoom: https://uva-live.zoom.us/j/89796690874
Meeting ID: 897 9669 0874
Lifting the Information Ratio: An Information-Theoretic
Analysis of Thompson Sampling for Contextual Bandits
We study the Bayesian regret of the renowned Thompson Sampling
algorithm in contextual bandits with binary losses and
adversarially-selected contexts. We adapt the information-theoretic
perspective of Russo and Van Roy [2016] to the contextual setting by
introducing a new concept of information ratio based on the mutual
information between the unknown model parameter and the observed
loss. This allows us to bound the regret in terms of the entropy of
the prior distribution through a remarkably simple proof, and with
no structural assumptions on the likelihood or the prior. We also
extend our results to priors with infinite entropy under a Lipschitz
assumption on the log-likelihood. An interesting special case is
that of logistic bandits with d-dimensional parameters, K actions,
and Lipschitz logits.
This is joint work with Gergely Neu, Matteo Papini and Ludovic
Schwartz.
Seminar organizers:
Tim van Erven
Botond Szabo
https://mschauer.github.io/StructuresSeminar/
--
Tim van Erven <tim@timvanerven.nl>
www.timvanerven.nl