Fisher's noncentral hypergeometric distribution and population size estimation problems

Ballerini, Veronica

Fisher’s noncentral hypergeometric distribution (FNCH) describes a biased urn experiment with independent draws of differently coloured balls where each colour is associated with a different weight (Fisher (1935), Fog (2008a)). FNCH potentially suits many official statistics problems. However, such distribution has been underemployed in the statistical literature mainly because of the computational burden given by its probability mass function. Indeed, as the number of draws and the number of different categories in the population increases, any method involving evaluating the likelihood is practically unfeasible. In the first part of this work, we present a methodology to estimate the posterior distribution of the population size, exploiting both the possibility of including extra-experimental information and the computational efficiency of MCMC and ABC methods. The second part devotes particular attention to overcoverage, i.e., the possibility that one or more data sources erroneously include some out-of-scope units. After a critical review of the most recent literature, we present an alternative modelisation of the latent erroneous counts in a capture-recapture framework, simultaneously addressing overcoverage and undercoverage problems. We show the utility of FNCH in this context, both in the posterior sampling process and in the elicitation of prior distributions. We rely on the PCI assumption of Zhang (2019) to include non-negligible prior information. Finally, we address model selection, which is not trivial in the framework of log-linear models when there are a few (or even zero) degrees of freedom.

Fisher's noncentral hypergeometric distribution and population size estimation problems

BALLERINI, VERONICA

2021

Abstract

Fisher’s noncentral hypergeometric distribution (FNCH) describes a biased urn experiment with independent draws of differently coloured balls where each colour is associated with a different weight (Fisher (1935), Fog (2008a)). FNCH potentially suits many official statistics problems. However, such distribution has been underemployed in the statistical literature mainly because of the computational burden given by its probability mass function. Indeed, as the number of draws and the number of different categories in the population increases, any method involving evaluating the likelihood is practically unfeasible. In the first part of this work, we present a methodology to estimate the posterior distribution of the population size, exploiting both the possibility of including extra-experimental information and the computational efficiency of MCMC and ABC methods. The second part devotes particular attention to overcoverage, i.e., the possibility that one or more data sources erroneously include some out-of-scope units. After a critical review of the most recent literature, we present an alternative modelisation of the latent erroneous counts in a capture-recapture framework, simultaneously addressing overcoverage and undercoverage problems. We show the utility of FNCH in this context, both in the posterior sampling process and in the elicitation of prior distributions. We rely on the PCI assumption of Zhang (2019) to include non-negligible prior information. Finally, we address model selection, which is not trivial in the framework of log-linear models when there are a few (or even zero) degrees of freedom.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI METODI E MODELLI PER L'ECONOMIA, IL TERRITORIO E LA FINANZA
			
	Corso di studio
	
				Metodi e modelli per l'economia e la finanza
			
	Data di pubblicazione
	
				23-lug-2021
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Fisher's noncentral hypergeometric; population size estimation; capture-recapture; heterogeneity; log-linear models; overcoverage; erroneous enumeration
			
	Relatore, Supervisor, Advisor o Tutor
	
				LISEO, Brunero
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				LISEO, Brunero
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Ballerini.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 1.07 MB Formato Adobe PDF Visualizza/Apri	1.07 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/177981

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-177981