Fisher’s noncentral hypergeometric distribution (FNCH) describes a biased urn experiment with independent draws of differently coloured balls where each colour is associated with a different weight (Fisher (1935), Fog (2008a)). FNCH potentially suits many official statistics problems. However, such distribution has been underemployed in the statistical literature mainly because of the computational burden given by its probability mass function. Indeed, as the number of draws and the number of different categories in the population increases, any method involving evaluating the likelihood is practically unfeasible. In the first part of this work, we present a methodology to estimate the posterior distribution of the population size, exploiting both the possibility of including extra-experimental information and the computational efficiency of MCMC and ABC methods. The second part devotes particular attention to overcoverage, i.e., the possibility that one or more data sources erroneously include some out-of-scope units. After a critical review of the most recent literature, we present an alternative modelisation of the latent erroneous counts in a capture-recapture framework, simultaneously addressing overcoverage and undercoverage problems. We show the utility of FNCH in this context, both in the posterior sampling process and in the elicitation of prior distributions. We rely on the PCI assumption of Zhang (2019) to include non-negligible prior information. Finally, we address model selection, which is not trivial in the framework of log-linear models when there are a few (or even zero) degrees of freedom.

Fisher's noncentral hypergeometric distribution and population size estimation problems

BALLERINI, VERONICA
2021

Abstract

Fisher’s noncentral hypergeometric distribution (FNCH) describes a biased urn experiment with independent draws of differently coloured balls where each colour is associated with a different weight (Fisher (1935), Fog (2008a)). FNCH potentially suits many official statistics problems. However, such distribution has been underemployed in the statistical literature mainly because of the computational burden given by its probability mass function. Indeed, as the number of draws and the number of different categories in the population increases, any method involving evaluating the likelihood is practically unfeasible. In the first part of this work, we present a methodology to estimate the posterior distribution of the population size, exploiting both the possibility of including extra-experimental information and the computational efficiency of MCMC and ABC methods. The second part devotes particular attention to overcoverage, i.e., the possibility that one or more data sources erroneously include some out-of-scope units. After a critical review of the most recent literature, we present an alternative modelisation of the latent erroneous counts in a capture-recapture framework, simultaneously addressing overcoverage and undercoverage problems. We show the utility of FNCH in this context, both in the posterior sampling process and in the elicitation of prior distributions. We rely on the PCI assumption of Zhang (2019) to include non-negligible prior information. Finally, we address model selection, which is not trivial in the framework of log-linear models when there are a few (or even zero) degrees of freedom.
23-lug-2021
Inglese
Fisher's noncentral hypergeometric; population size estimation; capture-recapture; heterogeneity; log-linear models; overcoverage; erroneous enumeration
LISEO, Brunero
LISEO, Brunero
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Ballerini.pdf

accesso aperto

Dimensione 1.07 MB
Formato Adobe PDF
1.07 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/177981
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-177981