Low-resolution, coarse-grained models are powerful computational tools to investigate the behavior of biological systems over time and length scales that are not accessible to all-atom Molecular Dynamics simulations. While several algorithms exist that aim at constructing accurate coarse-grained potentials, few works focus on the choice of the reduced representation, or mapping, to be employed to describe the high-resolution system with a lower number of degrees of freedom. This thesis proposes a series of approaches to investigate and characterise the representation problem in coarse-grained modelling of proteins. This is achieved by employing a collection of diverse methods, including statistical mechanics, machine learning algorithms and information-theoretical tools. The central mathematical object of this work is the mapping entropy, a Kullback-Leibler divergence that measures the intrinsic quality of a given reduced representation. When this quantity is minimised, we obtain the maximally informative coarse-grained mappings of a biomolecule, which cover the structure with an uneven level of detail. Tests conducted over a set of well-known proteins show that regions preserved with high probability are often related to important functional mechanisms of the molecule. Applications of the mapping entropy outside of the field of structural biology show promising results, leading to the identification of those combinations of features that retain the maximum amount of information about the high-resolution system. Additionally, a purely structural notion of scalar product and distance between coarse-grained mappings is introduced, which allow to analyse the metric and topological properties of the mapping space. The thorough exploration of such space leads to the discovery of qualitatively different reduced representations of the biomolecule of interest.

The mapping problem in coarse-grained modelling of biomolecules

Giulini, Marco
2022

Abstract

Low-resolution, coarse-grained models are powerful computational tools to investigate the behavior of biological systems over time and length scales that are not accessible to all-atom Molecular Dynamics simulations. While several algorithms exist that aim at constructing accurate coarse-grained potentials, few works focus on the choice of the reduced representation, or mapping, to be employed to describe the high-resolution system with a lower number of degrees of freedom. This thesis proposes a series of approaches to investigate and characterise the representation problem in coarse-grained modelling of proteins. This is achieved by employing a collection of diverse methods, including statistical mechanics, machine learning algorithms and information-theoretical tools. The central mathematical object of this work is the mapping entropy, a Kullback-Leibler divergence that measures the intrinsic quality of a given reduced representation. When this quantity is minimised, we obtain the maximally informative coarse-grained mappings of a biomolecule, which cover the structure with an uneven level of detail. Tests conducted over a set of well-known proteins show that regions preserved with high probability are often related to important functional mechanisms of the molecule. Applications of the mapping entropy outside of the field of structural biology show promising results, leading to the identification of those combinations of features that retain the maximum amount of information about the high-resolution system. Additionally, a purely structural notion of scalar product and distance between coarse-grained mappings is introduced, which allow to analyse the metric and topological properties of the mapping space. The thorough exploration of such space leads to the discovery of qualitatively different reduced representations of the biomolecule of interest.
14-feb-2022
Inglese
Potestio, Raffaello
Menichetti, Roberto
Università degli studi di Trento
TRENTO
174
File in questo prodotto:
File Dimensione Formato  
phd_unitn_marco_giulini.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 35.49 MB
Formato Adobe PDF
35.49 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/89785
Il codice NBN di questa tesi è URN:NBN:IT:UNITN-89785