13th edition of the ABS (Applied Bayesian Statistics) Summer School
BAYES, BIG DATA, AND THE INTERNET
Lecturer: STEVE SCOTT, Senior Economic Analyst at Google
Like in the past three years, the 2016 school will be held in the magnificent Villa del Grumello, in Como (Italy), on the Lake Como shore.
Guido Consonni and Fabrizio Ruggeri ABS16 Directors Raffaele Argiento ABS16 Executive Director
------------------------------------------------------------------------
******************************* * ABS16 * *******************************
Applied Bayesian Statistics School
BAYES, BIG DATA, AND THE INTERNET
August, 29 - September, 2 2016
Villa del Grumello, Como, Italy
Lecturer:
Steve Scott, Senior Economic Analyst at Google, USA https://sites.google.com/site/stevethebayesian/
The conference webpage is
web.mi.imati.cnr.it/conferences/abs16.html <<<<
Registration is now open. Please note that the conference room allows only for a very limited number of participants.
The ABS16 Secretariat can be contacted at
abs16@mi.imati.cnr.it
COURSE OUTLINE
Day 1: One of the main applications of statistics at internet companies is to A/B testing (e.g. to determine which version of a web site is most effective). Standard statistical methods for A/B testing using static experimental design is typically dominated (in terms of cost) by sequential methods based on Thompson sampling. The first day of the course uses Thompson sampling to motivate standard low dimensional statistical modeling such as beta-binomial, Poisson-gamma and normal-normal models. We will review Monte Carlo method for computing with Bayesian models, including Gibbs sampling, Metropolis-Hastings, and slice sampling.
Day 2: Very quickly after implementing an experiment, one discovers that more than one factor needs to be tested (e.g. you need to test whether the button should be red or blue, AND whether it should be located on the left or the right of the page). You also need to determine whether the optimal design varies according to various contextual factors (e.g. whether the page is shown on the weekend or on a week day). These factors can be handled within the Thompson sampling framework by extending the reward distribution to various generalized linear models. The second day of the course will discuss Bayesian regression in linear and generalized linear models using data augmentation. Probit, logit, Poisson, and student T models will be covered.
Day 3: If the number of factors to be tested (or number of contexts to be controlled for) is very large, then sparse models must be considered. Sparsity can be introduced into a Bayesian model using a "spike-and-slab" prior that places some prior mass at zero for each coefficient in a linear model. The data can then move posterior mass between the "spike" (at zero) and "slab" (nonzero) portions of the posterior distribution.
Day 4: One valuable aspect of internet data is that it occurs in nearly real time, while many official statistics are released as monthly or quarterly time series. One way to model these data is with dynamic linear models. These models can combine time series model (capturing the target series' past behavior) with a sparse regression component (capturing the impact of contemporaneous internet data). We will work on examples using economic data from FRED (the St. Louis Federal Reserve Economic Database) and data from Google trends.
Day 5: we will discuss methods that can be applied when facing "big" data that must be distributed across several machines.
PRACTICAL INFORMATION
The school will be as usual, with lectures and practical sessions (run by a junior researcher), participants' talks, free Wednesday afternoon, start on Monday after lunch and ending on Friday before lunch. Accommodation is available either at the Villa guesthouse or in downtown hotels (info will appear soon on the website). Como can be easily reached by train from Milano and its airports. More details are available on the website.