13th edition of the ABS (Applied Bayesian Statistics) Summer School
BAYES, BIG DATA, AND THE INTERNET
Lecturer: STEVE SCOTT, Senior Economic Analyst at Google
Like in the past three years, the 2016 school will be held
in the magnificent Villa del Grumello, in Como (Italy), on
the Lake Como shore.
Guido Consonni and Fabrizio Ruggeri
ABS16 Directors
Raffaele Argiento
ABS16 Executive Director
------------------------------------------------------------------------
*******************************
* ABS16 *
*******************************
Applied Bayesian Statistics School
BAYES, BIG DATA, AND THE INTERNET
August, 29 - September, 2 2016
Villa del Grumello, Como, Italy
Lecturer:
Steve Scott, Senior Economic Analyst at Google, USA
https://sites.google.com/site/stevethebayesian/
The conference webpage is
>>>> web.mi.imati.cnr.it/conferences/abs16.html <<<<
Registration is now open. Please note that the conference
room allows only for a very limited number of participants.
The ABS16 Secretariat can be contacted at
abs16@mi.imati.cnr.it
COURSE OUTLINE
Day 1: One of the main applications of statistics at internet
companies is to A/B testing (e.g. to determine which version of a web
site is most effective). Standard statistical methods for A/B testing
using static experimental design is typically dominated (in terms of
cost) by sequential methods based on Thompson sampling. The first day
of the course uses Thompson sampling to motivate standard low
dimensional statistical modeling such as beta-binomial, Poisson-gamma
and normal-normal models. We will review Monte Carlo method for
computing with Bayesian models, including Gibbs sampling,
Metropolis-Hastings, and slice sampling.
Day 2: Very quickly after implementing an experiment, one discovers
that more than one factor needs to be tested (e.g. you need to test
whether the button should be red or blue, AND whether it should be
located on the left or the right of the page). You also need to
determine whether the optimal design varies according to various
contextual factors (e.g. whether the page is shown on the weekend or
on a week day). These factors can be handled within the Thompson
sampling framework by extending the reward distribution to various
generalized linear models. The second day of the course will discuss
Bayesian regression in linear and generalized linear models using data
augmentation. Probit, logit, Poisson, and student T models will be
covered.
Day 3: If the number of factors to be tested (or number of contexts to
be controlled for) is very large, then sparse models must be
considered. Sparsity can be introduced into a Bayesian model using a
"spike-and-slab" prior that places some prior mass at zero for each
coefficient in a linear model. The data can then move posterior mass
between the "spike" (at zero) and "slab" (nonzero) portions of the
posterior distribution.
Day 4: One valuable aspect of internet data is that it occurs in
nearly real time, while many official statistics are released as
monthly or quarterly time series. One way to model these data is with
dynamic linear models. These models can combine time series model
(capturing the target series' past behavior) with a sparse regression
component (capturing the impact of contemporaneous internet data). We
will work on examples using economic data from FRED (the St. Louis
Federal Reserve Economic Database) and data from Google trends.
Day 5: we will discuss methods that can be applied when facing "big"
data that must be distributed across several machines.
PRACTICAL INFORMATION
The school will be as usual, with lectures and practical sessions
(run by a junior researcher), participants' talks, free Wednesday afternoon, start on Monday after lunch and ending on Friday before
lunch. Accommodation is available either at the Villa guesthouse or in downtown hotels (info will appear soon on the website). Como can be
easily reached by train from Milano and its airports. More details
are available on the website.