The study of complex systems is related to the analysis of emerging properties and collective behaviors of systems whose components are usually well-known and can be defined in terms of variables. Most natural or artificial dynamical systems are characterized by groups of variables (Relevant Sets or, shortly, RSs in the following) which appear to be well coordinated among themselves while having weaker interactions with the remainder of the system. The capacity of detecting RSs can often lead to a high-level description of the dynamical organization of a complex system, and thus to its understanding. In several cases, the detailed interactions among the system components are not known and, in case of systems where the components are interacting non-linearly, the dynamic relationships among variables might not be entirely described by a merely topological view. Therefore, it is often necessary to derive some hints about the system organization by observing the behavior of its dynamically relevant parts. For these reasons, the work of this thesis is based on the Relevance Index (RI) metrics, a set of information-theoretical metrics that can be used to analyze complex systems by detecting the main interacting structures (i.e., the RSs) within them, starting from the observation of the status of the system variables over time. The search of relevant subsets within a dynamical system corresponds to an optimization problem (maximization of an RI metric). To fully describe a dynamical system, it would be necessary to compute such a metric for each of the possible subsets of the system variables. Unfortunately, their number increases exponentially with the number of variables, soon reaching unrealistic computation requirements. As a consequence, it is necessary to design efficient strategies which can limit the extension of the search by quickly identifying the most promising subsets. Thus, we propose to use population-based niching metaheuristics, derived by hybridizing genetic algorithms (GAs) and particle swarm optimization (PSO) with local searches. From a methodological point of view, the work proposed in this thesis concerns also the design of some improvements to the RI metrics, in terms of both computational efficiency and effectiveness for complex systems analysis. Moreover, we propose an iterative algorithm aimed at revealing hierarchical dynamical structures. The analysis based on the RI metrics can be applied to a broad range of dynamical systems, both natural and artificial. In particular, in this work we present some applications of the proposed methodology within different contexts, such as chemical systems, biological networks and social networks. From an application-oriented viewpoint, this work has also aimed at evaluating the applicability of the RI methodology to machine learning and pattern recognition problems. In particular, the properties that the RI metrics can highlight in time series, when analyzing the dynamics of complex systems, can be assimilated to the properties of the multivariate distribution of the examples of a machine learning training set, e.g., the distribution of static patterns subject to noise, translations, etc. Therefore, the last part of this thesis concerns the design of RI-based algorithms that implement machine learning and pattern recognition tasks (clustering, classification, feature extraction and preprocessing). The experimental results have been evaluated on an image set of digits extracted from license plates and on some other publicly available datasets, as well as on data collected from Twitter within an automatic troll detection application.

Detection of relevant structures in complex systems: an information-theoretic approach with applications to machine learning and pattern recognition

2020

Abstract

The study of complex systems is related to the analysis of emerging properties and collective behaviors of systems whose components are usually well-known and can be defined in terms of variables. Most natural or artificial dynamical systems are characterized by groups of variables (Relevant Sets or, shortly, RSs in the following) which appear to be well coordinated among themselves while having weaker interactions with the remainder of the system. The capacity of detecting RSs can often lead to a high-level description of the dynamical organization of a complex system, and thus to its understanding. In several cases, the detailed interactions among the system components are not known and, in case of systems where the components are interacting non-linearly, the dynamic relationships among variables might not be entirely described by a merely topological view. Therefore, it is often necessary to derive some hints about the system organization by observing the behavior of its dynamically relevant parts. For these reasons, the work of this thesis is based on the Relevance Index (RI) metrics, a set of information-theoretical metrics that can be used to analyze complex systems by detecting the main interacting structures (i.e., the RSs) within them, starting from the observation of the status of the system variables over time. The search of relevant subsets within a dynamical system corresponds to an optimization problem (maximization of an RI metric). To fully describe a dynamical system, it would be necessary to compute such a metric for each of the possible subsets of the system variables. Unfortunately, their number increases exponentially with the number of variables, soon reaching unrealistic computation requirements. As a consequence, it is necessary to design efficient strategies which can limit the extension of the search by quickly identifying the most promising subsets. Thus, we propose to use population-based niching metaheuristics, derived by hybridizing genetic algorithms (GAs) and particle swarm optimization (PSO) with local searches. From a methodological point of view, the work proposed in this thesis concerns also the design of some improvements to the RI metrics, in terms of both computational efficiency and effectiveness for complex systems analysis. Moreover, we propose an iterative algorithm aimed at revealing hierarchical dynamical structures. The analysis based on the RI metrics can be applied to a broad range of dynamical systems, both natural and artificial. In particular, in this work we present some applications of the proposed methodology within different contexts, such as chemical systems, biological networks and social networks. From an application-oriented viewpoint, this work has also aimed at evaluating the applicability of the RI methodology to machine learning and pattern recognition problems. In particular, the properties that the RI metrics can highlight in time series, when analyzing the dynamics of complex systems, can be assimilated to the properties of the multivariate distribution of the examples of a machine learning training set, e.g., the distribution of static patterns subject to noise, translations, etc. Therefore, the last part of this thesis concerns the design of RI-based algorithms that implement machine learning and pattern recognition tasks (clustering, classification, feature extraction and preprocessing). The experimental results have been evaluated on an image set of digits extracted from license plates and on some other publicly available datasets, as well as on data collected from Twitter within an automatic troll detection application.
Identificazione di strutture rilevanti nei sistemi complessi: un approccio basato sulla teoria dell'informazione con applicazioni a machine learning e pattern recognition
mar-2020
Inglese
complex systems analysis
RI metrics
machine learning
pattern recognition
ING-INF/05
Università degli Studi di Parma
File in questo prodotto:
File Dimensione Formato  
LauraSaniRelazioneFinale.pdf

accesso solo da BNCF e BNCR

Tipologia: Altro materiale allegato
Dimensione 82.31 kB
Formato Adobe PDF
82.31 kB Adobe PDF
LauraSaniTesiDottorato.pdf

accesso solo da BNCF e BNCR

Tipologia: Altro materiale allegato
Dimensione 10.12 MB
Formato Adobe PDF
10.12 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/135456
Il codice NBN di questa tesi è URN:NBN:IT:UNIPR-135456