Dear all,
It is my pleasure to announce the following CWI Machine Learning seminar.
Speaker: Christian Hennig (University of Bologna)
Title: A spotlight on statistical model assumptions
Date: Friday 27 November, 15:00
Location:
https://us02web.zoom.us/j/82596062334?pwd=OTMwU2JmYUFRK0NLYW42OTExWDRyUT09
Please find the abstract below.
Hope to see you then.
Best wishes,
Wouter
Details:
https://portals.project.cwi.nl/ml-reading-group/events/a-spotlight-on-stati…
============
A spotlight on statistical model assumptions
Christian Hennig (University of Bologna)
Many statistics teachers tell their students something like "In order to
apply the t-test, we have to assume that the data are i.i.d. normally
distributed, and therefore these model assumptions need to be checked
before applying the t-test." This statement is highly problematic in
several respects. There is no good reason to believe that any real data
truly are drawn i.i.d. normally. Furthermore, quite relevant aspects of
these model assumptions cannot be checked. For example, I will show that
data generated from a normal distribution with a correlation of
$\rho\neq 0$ between any two observations cannot be distinguished from
i.i.d. normal data. On top of this, passing a model by a model checking
test will automatically invalidate it; much literature investigating the
performance of specific procedures that run model-based tests
conditionally on passing a model misspecification test comment very
critically on this practice.
Despite all these issues, I will defend interpreting and using
statistical models in a frequentist manner, by advocating an
understanding of models that never forgets that models are essentially
different from reality (and in this sense can never be "true"). Model
assumptions specify idealised conditions under which methods work well;
in reality they do not need to be fulfilled. However, situations in
which the data will mislead a method need to be distinguished from
situations in which a method does what it is expected to do. This
defines a more appropriate task for model checking. Conditions are
required for doing this job properly that some model checking currently
in use does not fulfill. For better "model checking" it will be helpful
to understand that this is not about "finding out whether the model
assumptions hold", but about something quite different.