Many relevant multidimensional phenomena, such as well-being, climate change, sustainable development, poverty and so on, are defined by nested latent concepts, which can be represented by a tree-shape structure supposing hierarchical relationships among observed variables. In literature, several methodologies have been proposed to both model the relationships among observed variables that reflect unobserved ones, and assess the existence of unobserved variables of "higher-order''. Nonetheless, these methodologies are usually developed with sequential procedures that do not optimize a unique objective function, and/or a confirmatory approach, i.e., by setting the relationships between observed and unobserved variables a priori. This dissertation discusses some new simultaneous, exploratory and parsimonious models for hierarchical dimensionality reduction, which overcome the limitations of the existing methodologies. The proposals introduced herein are based, "directly'' or "indirectly'', upon the definition of an ultrametric matrix, that differs from the well-known definition of an ultrametric distance matrix and is one-to-one associated with a hierarchy of latent concepts. The first proposal allows to model a nonnegative correlation matrix via an ultrametric correlation one by detecting reliable concepts, associated with disjoint groups of variables, and hierarchical relationships among them. The second work compares the first proposal with the traditional agglomerative hierarchical clustering algorithms applied on variables, after a transformation of correlations into distances, by highlighting the need for specific models to inspect the hierarchical relationships among variables. The third proposal extends the definition of an ultrametric matrix to a generic one by relaxing the non-negativity assumption and applying it to a covariance matrix. The extended ultrametric covariance matrix is then used to model the covariance structures of a Gaussian mixture model by both defining a new parsimonious parameterization of a covariance matrix and inspecting the hierarchical structure underlying multidimensional phenomena in heterogeneous populations. The fourth proposal introduces a quantification of latent concepts via a hierarchical extension of the Disjoint Principal Component Analysis. Even if not directly based on the definition of an ultrametric matrix, this proposal aims in turn at pinpointing nested partitions of variables into groups, each one associated with a component. The proposed models are illustrated both via simulation studies and real data applications in order to study their performances and abilities.

Ultrametric models for hierarchical dimensionality reduction

ZACCARIA, GIORGIA
2022

Abstract

Many relevant multidimensional phenomena, such as well-being, climate change, sustainable development, poverty and so on, are defined by nested latent concepts, which can be represented by a tree-shape structure supposing hierarchical relationships among observed variables. In literature, several methodologies have been proposed to both model the relationships among observed variables that reflect unobserved ones, and assess the existence of unobserved variables of "higher-order''. Nonetheless, these methodologies are usually developed with sequential procedures that do not optimize a unique objective function, and/or a confirmatory approach, i.e., by setting the relationships between observed and unobserved variables a priori. This dissertation discusses some new simultaneous, exploratory and parsimonious models for hierarchical dimensionality reduction, which overcome the limitations of the existing methodologies. The proposals introduced herein are based, "directly'' or "indirectly'', upon the definition of an ultrametric matrix, that differs from the well-known definition of an ultrametric distance matrix and is one-to-one associated with a hierarchy of latent concepts. The first proposal allows to model a nonnegative correlation matrix via an ultrametric correlation one by detecting reliable concepts, associated with disjoint groups of variables, and hierarchical relationships among them. The second work compares the first proposal with the traditional agglomerative hierarchical clustering algorithms applied on variables, after a transformation of correlations into distances, by highlighting the need for specific models to inspect the hierarchical relationships among variables. The third proposal extends the definition of an ultrametric matrix to a generic one by relaxing the non-negativity assumption and applying it to a covariance matrix. The extended ultrametric covariance matrix is then used to model the covariance structures of a Gaussian mixture model by both defining a new parsimonious parameterization of a covariance matrix and inspecting the hierarchical structure underlying multidimensional phenomena in heterogeneous populations. The fourth proposal introduces a quantification of latent concepts via a hierarchical extension of the Disjoint Principal Component Analysis. Even if not directly based on the definition of an ultrametric matrix, this proposal aims in turn at pinpointing nested partitions of variables into groups, each one associated with a component. The proposed models are illustrated both via simulation studies and real data applications in order to study their performances and abilities.
22-feb-2022
Inglese
Ultrametricity; dimensionality reduction; latent concepts; model-based clustering; Gaussian mixture models
VICHI, Maurizio
ALFO', Marco
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Zaccaria.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 6.41 MB
Formato Adobe PDF
6.41 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/96756
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-96756