We investigated the evolutionary and functional divergence of duplicated genes in vertebrates, focusing on enzyme neofunctionalization. Starting with approximately 2,400 enzymatic proteins sharing identical PFAM domain architectures, we identified orthologous gene pairs across major vertebrate clades using a specialized pipeline. High-quality multiple sequence alignments were analyzed to compute per-residue evolutionary metrics, such as In-group and Differential Conservation Scores, utilizing the BLOSUM62 substitution matrix to pinpoint residues contributing to functional divergence. Context-based metrics from ProtTrans embeddings and functional hotspot predictions via BindEmbed21, refined with AlphaFold models and P2Rank pocket predictions, further elucidated potential functional changes. By aggregating these per-residue scores into per-protein metrics, we systematically assessed functional divergence across gene pairs. Thresholds established using a truth set from the Rhea database revealed that 35% of the analyzed gene pairs exhibited strong evidence of neofunctionalization. Enrichment analyses incorporating tissue-specific expression data and functional annotations provided biological context for the observed divergence patterns. A case study on the skin-expressed AADACL2 gene illustrated our approach. Compared to its paralog AADAC, AADACL2 possesses additional functional pocket residues, suggesting a unique lipase function potentially involved in ceramide processing for cornified envelope formation. Experimental validation through heterologous expression faced challenges in protein solubility, leading us to consider ancestral sequence reconstruction for enhanced protein stability. Our findings advance the understanding of enzyme neofunctionalization in vertebrates and offer a framework for detecting functional divergence in duplicated genes.

Machine learning analysis of enzyme neofunctionalization following gene duplication in vertebrate evolution

Carlo, De Rito;
2025

Abstract

We investigated the evolutionary and functional divergence of duplicated genes in vertebrates, focusing on enzyme neofunctionalization. Starting with approximately 2,400 enzymatic proteins sharing identical PFAM domain architectures, we identified orthologous gene pairs across major vertebrate clades using a specialized pipeline. High-quality multiple sequence alignments were analyzed to compute per-residue evolutionary metrics, such as In-group and Differential Conservation Scores, utilizing the BLOSUM62 substitution matrix to pinpoint residues contributing to functional divergence. Context-based metrics from ProtTrans embeddings and functional hotspot predictions via BindEmbed21, refined with AlphaFold models and P2Rank pocket predictions, further elucidated potential functional changes. By aggregating these per-residue scores into per-protein metrics, we systematically assessed functional divergence across gene pairs. Thresholds established using a truth set from the Rhea database revealed that 35% of the analyzed gene pairs exhibited strong evidence of neofunctionalization. Enrichment analyses incorporating tissue-specific expression data and functional annotations provided biological context for the observed divergence patterns. A case study on the skin-expressed AADACL2 gene illustrated our approach. Compared to its paralog AADAC, AADACL2 possesses additional functional pocket residues, suggesting a unique lipase function potentially involved in ceramide processing for cornified envelope formation. Experimental validation through heterologous expression faced challenges in protein solubility, leading us to consider ancestral sequence reconstruction for enhanced protein stability. Our findings advance the understanding of enzyme neofunctionalization in vertebrates and offer a framework for detecting functional divergence in duplicated genes.
Machine learning analysis of enzyme neofunctionalization following gene duplication in vertebrate evolution
8-mag-2025
ENG
Evolution
Protein function
Gene duplication
Enzyme
Vertebrates
Neofunctionalization
Machine learning
Phylogenetics
Bioinformatics
BIO/10
BIOS-07/A
Riccardo, Percudani
Università degli Studi di Parma. Dipartimento di Scienze Chimiche, della vita e della sostenibilità ambientale
File in questo prodotto:
File Dimensione Formato  
Carlo_De_Rito_PhD_Thesis.pdf

Open Access dal 02/03/2026

Licenza: Tutti i diritti riservati
Dimensione 43.79 MB
Formato Adobe PDF
43.79 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/213384
Il codice NBN di questa tesi è URN:NBN:IT:UNIPR-213384