Predicting protein dynamics on a molecular level is central to understanding and ultimately controlling the biomolecular machines that govern life. Despite advances in molecular dynamics and AI-based structure prediction, the accurate and efficient simulation of complex self-assembly processes - such as protein aggregation, protein folding, and the dynamics of intrinsically disordered proteins - remains a limitation of most approaches due to system size and sampling limitations. This work presents the development and application of the multi-eGO model, a data-driven, hybrid structure-based approach designed to overcome these barriers. By combining an informative prior with high-resolution structural information or lower-resolution experimental data, the multi-eGO force field learns conformational ensembles across multiple energy minima while maintaining atomistic resolution. Applications include the folding dynamics of protein G and X11-PDZ1-PDZ2, the structural ensemble of amyloid-b42 the aggregation of transthyretin peptides, and the self-assembly of ferritin complexes. Complementary to these studies, conventional molecular dynamics simulations are used to research the effect of electric fields on the dynamics of amyloid b42 fibrils, revealing the potential of electric fields to disrupt assembly and secondary nucleation. Results demonstrate that multi-eGO can reproduce equilibrium, out-of-equilibrium, and kinetic features, and integrate heterogeneous data sources, such as SAXS and PRE NMR data, to refine the model without the need for explicit training. At the same time, limitations such as finite-size effects and kinetic trapping for intermolecular processes highlight the need for further refinement.

STRUCTURE-BASED APPROACHES TO DATA-DRIVEN PROTEIN FOLDING, AGGREGATION, AND SELF-ASSEMBLY

BACIC TOPLEK, FRAN
2025

Abstract

Predicting protein dynamics on a molecular level is central to understanding and ultimately controlling the biomolecular machines that govern life. Despite advances in molecular dynamics and AI-based structure prediction, the accurate and efficient simulation of complex self-assembly processes - such as protein aggregation, protein folding, and the dynamics of intrinsically disordered proteins - remains a limitation of most approaches due to system size and sampling limitations. This work presents the development and application of the multi-eGO model, a data-driven, hybrid structure-based approach designed to overcome these barriers. By combining an informative prior with high-resolution structural information or lower-resolution experimental data, the multi-eGO force field learns conformational ensembles across multiple energy minima while maintaining atomistic resolution. Applications include the folding dynamics of protein G and X11-PDZ1-PDZ2, the structural ensemble of amyloid-b42 the aggregation of transthyretin peptides, and the self-assembly of ferritin complexes. Complementary to these studies, conventional molecular dynamics simulations are used to research the effect of electric fields on the dynamics of amyloid b42 fibrils, revealing the potential of electric fields to disrupt assembly and secondary nucleation. Results demonstrate that multi-eGO can reproduce equilibrium, out-of-equilibrium, and kinetic features, and integrate heterogeneous data sources, such as SAXS and PRE NMR data, to refine the model without the need for explicit training. At the same time, limitations such as finite-size effects and kinetic trapping for intermolecular processes highlight the need for further refinement.
24-nov-2025
Inglese
CAMILLONI, CARLO
RICAGNO, STEFANO
Università degli Studi di Milano
172
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13841.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 2.73 MB
Formato Adobe PDF
2.73 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/310618
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-310618