************************************************************************ STATISTICS SEMINARS @ COLLEGIO CARLO ALBERTO ************************************************************************
Venerdì 27 Gennaio 2017 alle ore 12:00, presso l’Aula Rossa del Collegio Carlo Alberto, Moncalieri (TO), si terra' il seguente seminario:
Stéphane BOUCHERON (Université Paris-Diderot)
CONCENTRATION INEQUALITIES IN THE INFINITE URN SCHEME FOR OCCUPANCY COUNTS AND THE MISSING MASS, WITH APPLICATIONS TO GOOD-TURING ESTIMATORS AND ADAPTIVE STATISTICAL TEXT COMPRESSION
An infinite urn scheme is defined by a probability mass function over positive integers. A random allocation consists of a sample of N independent drawings according to this probability distribution where N may be deterministic or Poisson-distributed. We are concerned with occupancy counts, that is with the number of symbols with r or at least r occurrences in the sample, and with the missing mass that is the total probability of all symbols that do not occur in the sample. Without any further assumption on the sampling distribution, these random quantities are shown to satisfy Bernstein-type concentration inequalities. The variance factors in these concentration inequalities are shown to be tight if the sampling distribution satisfies a regular variation property. Among other applications, these concentration inequalities allow us to derive tight confidence intervals for the Good--Turing estimator of the missing mass. In statistical text compression, adaptive (or twice-universal) coding faces the following problem: given a collection of source classes such that each class in the collection has a non-trivial minimax redundancy rate, can we design a single code which is asymptotically minimax over each class in the collection? We deal with classes of sources over an infinite alphabet (that is with infinite urn schemes), that are characterized by a dominating envelope. We provide asymptotic equivalents for the redundancy of envelope classes enjoying a regular variation property. We construct a computationally efficient online code, which is shown to be adaptive, within a loglog(n) factor, over the collection of regularly varying envelope classes.
Tutti gli interessati sono invitati a partecipare.
Il seminario e' organizzato dalla "de Castro" Statistics Initiative (http://www.carloalberto.org/stats) in collaborazione con il Collegio Carlo Alberto.
Cordiali saluti, Matteo Ruggiero
--- Matteo Ruggiero University of Torino and Collegio Carlo Alberto www.matteoruggiero.it