When microdata files for research are released, it is possible that external users may attempt to breach confidentiality. For this reason most National Statistical Institutes apply some form of disclosure risk assessment and data protection. Risk assessment first requires a measure of disclosure risk to be defined. In this paper we build on previous work byBenedetti and Franconi (1998) to define a Bayesian hierarchical model for risk estimation. We follow a superpopulation approach similar to Bethlehem et al. (1990) and Rinott (2003). For each combination of values of the key variables we derive the posterior distribution of the population frequency given the observed sample frequency. Knowledge of this posterior distribution enables us to obtain suitable summaries that can be used to estimate the risk of disclosure. One such summary is the mean of the reciprocal of the population frequency or Benedetti-Franconi risk, but we also investigate others such as the mode. We apply our approach to an artificial sample of the Italian 1991 Census data, drawn by means of a widely used sampling scheme. We report on results of this application and document the computational difficulties that we encountered. The risk estimates that we obtain are sensible, but suggest possible improvements and modifications to our methodology. We discuss these together with potential alternative strategies.

A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation / Polettini, Silvia; Stander, J.. - STAMPA. - Lecture Notes in Computer Science 3050:(2004), pp. 247-261. [10.1007/978-3-540-25955-8_19]

A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation

POLETTINI, SILVIA;
2004

Abstract

When microdata files for research are released, it is possible that external users may attempt to breach confidentiality. For this reason most National Statistical Institutes apply some form of disclosure risk assessment and data protection. Risk assessment first requires a measure of disclosure risk to be defined. In this paper we build on previous work byBenedetti and Franconi (1998) to define a Bayesian hierarchical model for risk estimation. We follow a superpopulation approach similar to Bethlehem et al. (1990) and Rinott (2003). For each combination of values of the key variables we derive the posterior distribution of the population frequency given the observed sample frequency. Knowledge of this posterior distribution enables us to obtain suitable summaries that can be used to estimate the risk of disclosure. One such summary is the mean of the reciprocal of the population frequency or Benedetti-Franconi risk, but we also investigate others such as the mode. We apply our approach to an artificial sample of the Italian 1991 Census data, drawn by means of a widely used sampling scheme. We report on results of this application and document the computational difficulties that we encountered. The risk estimates that we obtain are sensible, but suggest possible improvements and modifications to our methodology. We discuss these together with potential alternative strategies.
2004
9783540221180
A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation / Polettini, Silvia; Stander, J.. - STAMPA. - Lecture Notes in Computer Science 3050:(2004), pp. 247-261. [10.1007/978-3-540-25955-8_19]
File in questo prodotto:
File Dimensione Formato  
fulltext.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Accesso privato/ristretto
Dimensione 243.51 kB
Formato Adobe PDF
243.51 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/114842
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact