Annuncio di seminario online
Stefano Favaro, Università di Torino e CNR-IMATI
Learning-augmented count-min sketches via Bayesian nonparametrics ven 28 gen 2022 11:30 - 12:30 (CET)
The count-min sketch (CMS) is a time and memory efficient randomized data structure that provides estimates of tokens’ frequencies in a data stream, i.e. point queries, based on randomly hashed data. Learning-augmented CMSs aim at improving the CMS by means of learning models that allow to better exploit data properties. We focus on the learning-augmented CMS of Cai, Mitzenmacher and Adams (NeurIPS, 2018), which relies on Bayesian nonparametric (BNP) modeling of a data stream via Dirichlet process (DP) priors; this is refereed to as the CMS-DP. We present a novel and rigorous approach of the CMS-DP, and we show that it allows to consider more general classes of nonparametric priors than the DP prior. We apply our approach to develop a novel learning-augmented CMS under power-law data streams, which relies on BNP modeling of the stream via Pitman-Yor process (PYP) priors; this is referred to as the CMS-PYP. Applications to synthetic data and real data show that the CMS-PYP outperforms both the CMS and CMS-DP in the estimation of low-frequency tokens; this is known to be a critical feature in natural language processing, where it is indeed common to encounter power-law data.
Partecipa alla mia riunione da computer, tablet o smartphone.
https://global.gotomeeting.com/join/148172789
Puoi accedere anche tramite telefono. (Per i dispositivi supportati, tocca un numero one-touch sotto per accedere immediatamente.)
Italia: +39 0 230 57 81 80 - One-touch: tel:+390230578180,,148172789#
Codice accesso: 148-172-789
È la prima volta che usi GoToMeeting? Scarica subito l'app e preparati all'inizio della tua prima riunione: https://global.gotomeeting.com/install/148172789