This dissertation attempts to gather the main research topics I engaged during my PhD, in collaboration with several national and international researchers. The primary focus of this work is to highlight the power of model based clustering for identifying latent structures in complex data and its usefulness in the social sciences. This methods have become increasingly popular in social science research as they allow for more accurate and nuanced understanding of complex data structures. In the thesis are presented 3 papers that contribute to the development and application of model-based clustering in social science research, covering a range of scenario. The thesis pays particular attention to the practical applications of the treated methods, providing insights that can improve our understanding of complex social phenomena. The first chapter of this dissertation introduces the usefulness of clustering model to deal with the complexity of society, and aware of some of the main issues when analysing socio-economic data. Following this conceptual introduction, the second chapter delves more into the technical aspects of model based clustering and estimation. These first two chapters pave the road for the three developments presented thereafter. The third chapter includes the application of a Mixture of Matrix-Normals classification model to the Migrant Integration Policy Index (MIPEX), that measures and evaluates countries policies toward migrants’ integration over time. The used model is suitable for longitudinal data and allows for the identification of clusters of countries with similar patterns of migrant integration policies over time. The work is published in Alaimo et al. [2021a]. The fourth chapter uses MIPEX data too, but for a single year, and a finite mixtures of multivariate Gaussian is applied to identify groups of countries with a similar level of integration. Then, the relative proportion of immigrants held in prison among clusters is estimated, exploiting Fisher’s noncentral hypergeometric model. The aim of this work is test the existence of an association between countries’ level of integration of immigrants and the proportion of immigrants in prison. The work is currently in referral process. The fifth chapter introduce the work developed during my visiting research period at University of Lyon, Lyon 2. It specify the Bayesian partial membership model for soft clustering of multivariate data, namely when units have fractional membership to multiple groups. The model is specified for count data, and it is applied on the data of the bike sharing company of Washington DC and on the data of Serie A football players. The last chapter summarizes the main points of the dissertation, underlining the most relevant findings, the contributions, and stressing out how clustering models altogether yield a cohesive treatment of socio-economic data.
Advances in model based clustering for the social sciences
SERI, EMILIANO
2023
Abstract
This dissertation attempts to gather the main research topics I engaged during my PhD, in collaboration with several national and international researchers. The primary focus of this work is to highlight the power of model based clustering for identifying latent structures in complex data and its usefulness in the social sciences. This methods have become increasingly popular in social science research as they allow for more accurate and nuanced understanding of complex data structures. In the thesis are presented 3 papers that contribute to the development and application of model-based clustering in social science research, covering a range of scenario. The thesis pays particular attention to the practical applications of the treated methods, providing insights that can improve our understanding of complex social phenomena. The first chapter of this dissertation introduces the usefulness of clustering model to deal with the complexity of society, and aware of some of the main issues when analysing socio-economic data. Following this conceptual introduction, the second chapter delves more into the technical aspects of model based clustering and estimation. These first two chapters pave the road for the three developments presented thereafter. The third chapter includes the application of a Mixture of Matrix-Normals classification model to the Migrant Integration Policy Index (MIPEX), that measures and evaluates countries policies toward migrants’ integration over time. The used model is suitable for longitudinal data and allows for the identification of clusters of countries with similar patterns of migrant integration policies over time. The work is published in Alaimo et al. [2021a]. The fourth chapter uses MIPEX data too, but for a single year, and a finite mixtures of multivariate Gaussian is applied to identify groups of countries with a similar level of integration. Then, the relative proportion of immigrants held in prison among clusters is estimated, exploiting Fisher’s noncentral hypergeometric model. The aim of this work is test the existence of an association between countries’ level of integration of immigrants and the proportion of immigrants in prison. The work is currently in referral process. The fifth chapter introduce the work developed during my visiting research period at University of Lyon, Lyon 2. It specify the Bayesian partial membership model for soft clustering of multivariate data, namely when units have fractional membership to multiple groups. The model is specified for count data, and it is applied on the data of the bike sharing company of Washington DC and on the data of Serie A football players. The last chapter summarizes the main points of the dissertation, underlining the most relevant findings, the contributions, and stressing out how clustering models altogether yield a cohesive treatment of socio-economic data.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Seri.pdf
accesso aperto
Dimensione
12.37 MB
Formato
Adobe PDF
|
12.37 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/99834
URN:NBN:IT:UNIROMA1-99834