The thesis concerns the use of big data in Official Statistics; the aim is to bring some new experimental studies into the big data literature. In particular, the purpose is to evaluate if and how big data could be used in Official Statistics. Only a few experiments exists on the use of big data for statistical purposes; it is a challenge task and a lot of experimentation is needed to find out evidence and solutions to use big data for statistical purposes. The analysis performed in the thesis goes into two different directions: 1. combining a traditional data source with a big data source to verify the potential of the latter to replicate official results; 2. analyzing a big data source per se and then trying to combine with an Official Statistics source to identify common patterns. The thesis initially proposes a literature review of definitions of big data and experiments, in particular concerning the use of the new sources combined with traditional data sources. Then, three original studies have been performed: the first two concern mobility in Lombardy region using mobile phone data. They both refer to the same issue (mobility patterns), but they differ in the traditional data source used: Origin/Destination matrix in the first case, an integrated version of the O/D matrix in the second. The objective of these two studies is trying to put in a unique interpretative framework one traditional statistical source and one typical kind of big data in order to evaluate some informative potentialities of this approach. In particular, we wanted to check if the two sources show common patterns, to evaluate future uses of the big data source in Official Statistics. The third study shows the pilot that was carried out during the traineeship I had the opportunity to attend at Eurostat, in collaboration with the Task Force Big Data. It concerns the use of Wikipedia, free online encyclopedia, for Tourism Statistics. The aim is to evaluate the use of Wikipedia page views as a source of information for the identification of factors that drive tourism to an area and whether it is possible to predict tourism flows using these data. A final chapter proposes conclusions and future remarks on the use of big data in Official Statistics. Two of the studies (the first on mobility patterns and the one on Wikipedia) have been or are being published, in a shorter and revised version. The three experiments show some potential in the use of big data in Official Statistics. The study needs more in-depth analysis, many more experiments and considerations will be necessary before we can achieve some definitive and convincing approaches.
The use of Big Data in Official Statistics
SIGNORELLI, Serena
2017
Abstract
The thesis concerns the use of big data in Official Statistics; the aim is to bring some new experimental studies into the big data literature. In particular, the purpose is to evaluate if and how big data could be used in Official Statistics. Only a few experiments exists on the use of big data for statistical purposes; it is a challenge task and a lot of experimentation is needed to find out evidence and solutions to use big data for statistical purposes. The analysis performed in the thesis goes into two different directions: 1. combining a traditional data source with a big data source to verify the potential of the latter to replicate official results; 2. analyzing a big data source per se and then trying to combine with an Official Statistics source to identify common patterns. The thesis initially proposes a literature review of definitions of big data and experiments, in particular concerning the use of the new sources combined with traditional data sources. Then, three original studies have been performed: the first two concern mobility in Lombardy region using mobile phone data. They both refer to the same issue (mobility patterns), but they differ in the traditional data source used: Origin/Destination matrix in the first case, an integrated version of the O/D matrix in the second. The objective of these two studies is trying to put in a unique interpretative framework one traditional statistical source and one typical kind of big data in order to evaluate some informative potentialities of this approach. In particular, we wanted to check if the two sources show common patterns, to evaluate future uses of the big data source in Official Statistics. The third study shows the pilot that was carried out during the traineeship I had the opportunity to attend at Eurostat, in collaboration with the Task Force Big Data. It concerns the use of Wikipedia, free online encyclopedia, for Tourism Statistics. The aim is to evaluate the use of Wikipedia page views as a source of information for the identification of factors that drive tourism to an area and whether it is possible to predict tourism flows using these data. A final chapter proposes conclusions and future remarks on the use of big data in Official Statistics. Two of the studies (the first on mobility patterns and the one on Wikipedia) have been or are being published, in a shorter and revised version. The three experiments show some potential in the use of big data in Official Statistics. The study needs more in-depth analysis, many more experiments and considerations will be necessary before we can achieve some definitive and convincing approaches.| File | Dimensione | Formato | |
|---|---|---|---|
| TDUnibg52978.pdf accesso aperto 
											Licenza:
											
											
												Tutti i diritti riservati
												
												
												
											
										 
										Dimensione
										6.78 MB
									 
										Formato
										Adobe PDF
									 | 6.78 MB | Adobe PDF | Visualizza/Apri | 
| Files_SIGNORELLI.zip accesso solo da BNCF e BNCR 
											Licenza:
											
											
												Tutti i diritti riservati
												
												
												
											
										 
										Dimensione
										668.06 kB
									 
										Formato
										Zip File
									 | 668.06 kB | Zip File | 
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/66136
			
		
	
	
	
			      	URN:NBN:IT:UNIBG-66136