This thesis tackles the social segregation problem from a data science perspective by proposing a segregation-aware data mining framework for the discovery of segregation from relational data and from attributed graphs. The approach is implemented in an efficient system and experimented on two challenging case studies in the do- main of occupational segregation in company boards. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is first introduced for relational data. It consists of searching sub-groups of popu lation and minorities for which a segregation index is above a minimum threshold. A search algorithm is devised that solves the segregation problem by computing a multi-dimensional data cube that can be explored by the analyst. The machinery underlying the search algorithm relies on frequent itemset mining tools. The approach is then extended to graph data consisting of bipartite attribute graphs, which model real networks by enriching their nodes with attribute values. Segregation indexes assume a partition of the population into organizational units (e.g., schools, neighborhoods, etc.), which are not obvious for graphs. We propose a fast and scalable algorithm for partitioning large attributed graphs. The approach does not require the user to guess in advance the number of clusters. Experimental results demonstrate its ability to efficiently compute high-quality partitions. Our implementation of the framework, called SCube, supports an analyst in discovering context of social segregation. Users of the system include social scientists, policy decision makers in socially sensitive fields (urban development, public transportation and services, medical and health managers, etc.), and control authorities. The system is developed in Java 8, hence portable, and thanks to state-of-the-art libraries achieve good performances on large datasets. We demonstrate the applicability of the proposed methodology and tools in a complex scenario, reflecting the risks of modern segregation in occupational social networks. The scenario considers glass-ceiling barriers for women in accessing boards of company directors. Two case studies are presented, one considering Italian companies and the other Estonian companies. The latter case incluse temporal information, thus allowing for temporal analysis of segregation.

Segregation aware data mining

2017

Abstract

This thesis tackles the social segregation problem from a data science perspective by proposing a segregation-aware data mining framework for the discovery of segregation from relational data and from attributed graphs. The approach is implemented in an efficient system and experimented on two challenging case studies in the do- main of occupational segregation in company boards. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is first introduced for relational data. It consists of searching sub-groups of popu lation and minorities for which a segregation index is above a minimum threshold. A search algorithm is devised that solves the segregation problem by computing a multi-dimensional data cube that can be explored by the analyst. The machinery underlying the search algorithm relies on frequent itemset mining tools. The approach is then extended to graph data consisting of bipartite attribute graphs, which model real networks by enriching their nodes with attribute values. Segregation indexes assume a partition of the population into organizational units (e.g., schools, neighborhoods, etc.), which are not obvious for graphs. We propose a fast and scalable algorithm for partitioning large attributed graphs. The approach does not require the user to guess in advance the number of clusters. Experimental results demonstrate its ability to efficiently compute high-quality partitions. Our implementation of the framework, called SCube, supports an analyst in discovering context of social segregation. Users of the system include social scientists, policy decision makers in socially sensitive fields (urban development, public transportation and services, medical and health managers, etc.), and control authorities. The system is developed in Java 8, hence portable, and thanks to state-of-the-art libraries achieve good performances on large datasets. We demonstrate the applicability of the proposed methodology and tools in a complex scenario, reflecting the risks of modern segregation in occupational social networks. The scenario considers glass-ceiling barriers for women in accessing boards of company directors. Two case studies are presented, one considering Italian companies and the other Estonian companies. The latter case incluse temporal information, thus allowing for temporal analysis of segregation.
2-nov-2017
Italiano
Ruggieri, Salvatore
Università degli Studi di Pisa
File in questo prodotto:
File Dimensione Formato  
thesis.pdf

accesso aperto

Tipologia: Altro materiale allegato
Dimensione 12.51 MB
Formato Adobe PDF
12.51 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/143966
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-143966