Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections, passed down through the germline in a Mendelian manner. They are prevalent in vertebrates, making up about 8% of the human genome. The ERV genome typically includes four key protein-coding genes: gag, pro, pol, and env, flanked by long terminal repeats (LTRs). The env gene, responsible for virus entry into cells, has evolved under various selection pressures, leading to its significant role in the study of diseases like cancer, autoimmunity, neurodegenerative disorders, and inflammation. Hence, the overall aim of the project was to characterize in detail env diversity across primate species, to characterize their acquisition and evolutionary dynamics within the host genome and gain insights about their past and residual coding potential, to subsequently investigate the tropism of ancestral and current HERV Env proteins. The first part of the project focused on the HERVK(HML2) group, which is significant due to its recent integration into the human genome and its functional conservation. While ERVs are found in all vertebrates, HERV-K is specific to primates and includes 10 HML subtypes, with HML2 being the youngest clade and featuring some human-specific integrations. The study examined HML2 distribution in non-human primates, specifically in Macaca fascicularis and Macaca mulatta, which are closely related to humans.We identified 208 HML2 proviruses: 77 in M. fascicularis and 131 in M. mulatta. Of these, 46 proviruses were shared between the two species, while only 12 were shared with humans, indicating that significant HML2 diffusion in humans occurred after the divergence from macaques. Phylogenetic analysis revealed structural variations among species-specific and shared proviruses. The analysis focused on potential open reading frames (ORFs) for the gag, pol, and env genes, finding that while gag and pol were highly conserved, the env genes were more divergent. Notably, 81 proviral sequences contained a MER11A insertion in the env region, indicating a recombination event that resulted in an env variant specific to Old World Monkeys. The exhaustive analysis of HML2 in humans and Macaque genomes indicated the divergent nature of env gene due to recombinations, we further aimed to reconstruct representative Env prototype sequences and screen them across the primate genomes in order to study the diversity patterns of ERVs. Hence, we reconstructed 32 Env sequences from ancestral proteins of Class I, II, and III HERVs and analyzed them using similarity searches, phylogenetic analysis, and recombination studies across 43 primate species in the Catarrhini and Platyrrhini parvorders. The findings indicate that ERVs are widely distributed throughout the primate lineage and highlight the presence of HML groups in Platyrrhini, suggesting their spread occurred before the divergence of New World and Old World monkeys over 40 million years ago. The study also identified significant interclass and intra-class env recombination events, illustrating the phenomenon of “env snatching” among primate ERVs. Overall, our work enhances the understanding of retroviral evolution and diversification patterns in ERVs among primates. The other aspects was to study the interaction of HERV Env with the human cellular proteins and to identify these interactions, we developed an unsupervised deep learning autoencoder model, which revealed several human cellular receptors that interact with HERV Env proteins. Key interactions were predicted for Syncytin-1, HERV-W, HERV-T, HML2-Rec, and Np9, highlighting their roles in sensory perception and cellular communication. Although each HERV contributes uniquely to specific pathways, they collectively influence pathological conditions, suggesting a shared evolutionary strategy among retroviral elements that impacts host biology.

From Fossils to Function: Exploring the Evolutionary And Functional Diversity Of HERV Envelope Proteins Across Primate Lineages

CHABUKSWAR, SAILI SHRIWARDHAN
2025

Abstract

Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections, passed down through the germline in a Mendelian manner. They are prevalent in vertebrates, making up about 8% of the human genome. The ERV genome typically includes four key protein-coding genes: gag, pro, pol, and env, flanked by long terminal repeats (LTRs). The env gene, responsible for virus entry into cells, has evolved under various selection pressures, leading to its significant role in the study of diseases like cancer, autoimmunity, neurodegenerative disorders, and inflammation. Hence, the overall aim of the project was to characterize in detail env diversity across primate species, to characterize their acquisition and evolutionary dynamics within the host genome and gain insights about their past and residual coding potential, to subsequently investigate the tropism of ancestral and current HERV Env proteins. The first part of the project focused on the HERVK(HML2) group, which is significant due to its recent integration into the human genome and its functional conservation. While ERVs are found in all vertebrates, HERV-K is specific to primates and includes 10 HML subtypes, with HML2 being the youngest clade and featuring some human-specific integrations. The study examined HML2 distribution in non-human primates, specifically in Macaca fascicularis and Macaca mulatta, which are closely related to humans.We identified 208 HML2 proviruses: 77 in M. fascicularis and 131 in M. mulatta. Of these, 46 proviruses were shared between the two species, while only 12 were shared with humans, indicating that significant HML2 diffusion in humans occurred after the divergence from macaques. Phylogenetic analysis revealed structural variations among species-specific and shared proviruses. The analysis focused on potential open reading frames (ORFs) for the gag, pol, and env genes, finding that while gag and pol were highly conserved, the env genes were more divergent. Notably, 81 proviral sequences contained a MER11A insertion in the env region, indicating a recombination event that resulted in an env variant specific to Old World Monkeys. The exhaustive analysis of HML2 in humans and Macaque genomes indicated the divergent nature of env gene due to recombinations, we further aimed to reconstruct representative Env prototype sequences and screen them across the primate genomes in order to study the diversity patterns of ERVs. Hence, we reconstructed 32 Env sequences from ancestral proteins of Class I, II, and III HERVs and analyzed them using similarity searches, phylogenetic analysis, and recombination studies across 43 primate species in the Catarrhini and Platyrrhini parvorders. The findings indicate that ERVs are widely distributed throughout the primate lineage and highlight the presence of HML groups in Platyrrhini, suggesting their spread occurred before the divergence of New World and Old World monkeys over 40 million years ago. The study also identified significant interclass and intra-class env recombination events, illustrating the phenomenon of “env snatching” among primate ERVs. Overall, our work enhances the understanding of retroviral evolution and diversification patterns in ERVs among primates. The other aspects was to study the interaction of HERV Env with the human cellular proteins and to identify these interactions, we developed an unsupervised deep learning autoencoder model, which revealed several human cellular receptors that interact with HERV Env proteins. Key interactions were predicted for Syncytin-1, HERV-W, HERV-T, HML2-Rec, and Np9, highlighting their roles in sensory perception and cellular communication. Although each HERV contributes uniquely to specific pathways, they collectively influence pathological conditions, suggesting a shared evolutionary strategy among retroviral elements that impacts host biology.
8-gen-2025
Inglese
TRAMONTANO, ENZO
GRANDI, NICOLE
Università degli Studi di Cagliari
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_XXXVII_Saili_final.pdf

accesso aperto

Dimensione 10.77 MB
Formato Adobe PDF
10.77 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/209410
Il codice NBN di questa tesi è URN:NBN:IT:UNICA-209410