Il prof Stefano Favaro terrà un seminario al CNR-IMATI, sede di Milano, mercoledì 13/6/2018 alle 14:30, via Alfonso Corti 12, aula A. Chi intendesse partecipare è pregato di comunicarlo a simona.milani@mi.imati.cnr.it per ragioni organizzative.
Cordialmente
A. Pievatolo
%%%%%%%%% titolo e abstract %%%%%%%%%%%
A Bayesian approach to disclosure risk assessment
Protection against disclosure is a legal and ethical obligation for statistical agencies releasing microdata files for public use. Given a cross classification of sample records by categorical key variables, any decision about release is supported by measures of disclosure risk, the most common being the number $\tau_{1}$ of sample uniques cells that are also population uniques. In this paper we depart from the dominant literature that infers $\tau_{1}$ by modeling association among key variables, and we consider modeling directly sample records. We develop a novel nonparametric Bayesian approach under the minimal assumption of a generalized Dirichlet prior for the random partition induced by the cross-classified sample records. This allows to derive an explicit, and simple, expression for the posterior distribution of $\tau_{1}$, as well as a large sample Binomial approximation of it. Such a closed-form results, combined with an estimator for prior parameters designed in such a way to recognizes a primary role of small cells, make inference on $\tau_{1}$ exact, of easy implementation, computationally efficient and scalable to massive datasets. The proposed approach is tested on benchmark data from the U.S. 2000 census for the state of California, showing the same good performance of recent semiparametric Bayesian models for key variables.