This dissertation attempts to gather the main research topics I engaged during my PhD, in collaboration with several national and international researchers. The primary focus of this work is to highlight the power of model based clustering for identifying latent structures in complex data and its usefulness in the social sciences. This methods have become increasingly popular in social science research as they allow for more accurate and nuanced understanding of complex data structures. In the thesis are presented 3 papers that contribute to the development and application of model-based clustering in social science research, covering a range of scenario. The thesis pays particular attention to the practical applications of the treated methods, providing insights that can improve our understanding of complex social phenomena. The first chapter of this dissertation introduces the usefulness of clustering model to deal with the complexity of society, and aware of some of the main issues when analysing socio-economic data. Following this conceptual introduction, the second chapter delves more into the technical aspects of model based clustering and estimation. These first two chapters pave the road for the three developments presented thereafter. The third chapter includes the application of a Mixture of Matrix-Normals classification model to the Migrant Integration Policy Index (MIPEX), that measures and evaluates countries policies toward migrants’ integration over time. The used model is suitable for longitudinal data and allows for the identification of clusters of countries with similar patterns of migrant integration policies over time. The work is published in Alaimo et al. [2021a]. The fourth chapter uses MIPEX data too, but for a single year, and a finite mixtures of multivariate Gaussian is applied to identify groups of countries with a similar level of integration. Then, the relative proportion of immigrants held in prison among clusters is estimated, exploiting Fisher’s noncentral hypergeometric model. The aim of this work is test the existence of an association between countries’ level of integration of immigrants and the proportion of immigrants in prison. The work is currently in referral process. The fifth chapter introduce the work developed during my visiting research period at University of Lyon, Lyon 2. It specify the Bayesian partial membership model for soft clustering of multivariate data, namely when units have fractional membership to multiple groups. The model is specified for count data, and it is applied on the data of the bike sharing company of Washington DC and on the data of Serie A football players. The last chapter summarizes the main points of the dissertation, underlining the most relevant findings, the contributions, and stressing out how clustering models altogether yield a cohesive treatment of socio-economic data.

Advances in model based clustering for the social sciences

SERI, EMILIANO
2023

Abstract

This dissertation attempts to gather the main research topics I engaged during my PhD, in collaboration with several national and international researchers. The primary focus of this work is to highlight the power of model based clustering for identifying latent structures in complex data and its usefulness in the social sciences. This methods have become increasingly popular in social science research as they allow for more accurate and nuanced understanding of complex data structures. In the thesis are presented 3 papers that contribute to the development and application of model-based clustering in social science research, covering a range of scenario. The thesis pays particular attention to the practical applications of the treated methods, providing insights that can improve our understanding of complex social phenomena. The first chapter of this dissertation introduces the usefulness of clustering model to deal with the complexity of society, and aware of some of the main issues when analysing socio-economic data. Following this conceptual introduction, the second chapter delves more into the technical aspects of model based clustering and estimation. These first two chapters pave the road for the three developments presented thereafter. The third chapter includes the application of a Mixture of Matrix-Normals classification model to the Migrant Integration Policy Index (MIPEX), that measures and evaluates countries policies toward migrants’ integration over time. The used model is suitable for longitudinal data and allows for the identification of clusters of countries with similar patterns of migrant integration policies over time. The work is published in Alaimo et al. [2021a]. The fourth chapter uses MIPEX data too, but for a single year, and a finite mixtures of multivariate Gaussian is applied to identify groups of countries with a similar level of integration. Then, the relative proportion of immigrants held in prison among clusters is estimated, exploiting Fisher’s noncentral hypergeometric model. The aim of this work is test the existence of an association between countries’ level of integration of immigrants and the proportion of immigrants in prison. The work is currently in referral process. The fifth chapter introduce the work developed during my visiting research period at University of Lyon, Lyon 2. It specify the Bayesian partial membership model for soft clustering of multivariate data, namely when units have fractional membership to multiple groups. The model is specified for count data, and it is applied on the data of the bike sharing company of Washington DC and on the data of Serie A football players. The last chapter summarizes the main points of the dissertation, underlining the most relevant findings, the contributions, and stressing out how clustering models altogether yield a cohesive treatment of socio-economic data.
30-mag-2023
Inglese
Model based clustering; social statistic; mixture of matrix normals; partial membership model
ROCCI, Roberto
JONA LASINIO, Giovanna
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Seri.pdf

accesso aperto

Dimensione 12.37 MB
Formato Adobe PDF
12.37 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/99834
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-99834