Viruses, and particularly bacteriophages, are key players in many microbial ecosystems and can profoundly influence the human microbiome and its impact on human health. While the bacterial and archaeal fraction of the human microbiome can now be profiled at an unprecedented resolution via cultivation-free metagenomics, viral metagenomics is still extremely challenging. The lack of universal viral genetic markers limits the de-novo discovery of viral entities, and the low number of available viral reference genomes from cultivation studies does not cover well the phage diversity in human microbiome samples. Viral-like particle (VLP) purification has been proposed as a set of experimental tools to concentrate viruses in samples prior to sequencing, but it remains unclear how efficient and reproducible such tools are in practice. In this thesis we aim to address some of these challenges and better exploit the potential of viral metagenomics in the context of the human microbiome. First, we performed and studied the performance of VLP procedures on freshwater and sediment samples. We found that bacteria can still be abundant at the end of the filtration process, thus lowering the efficiency of the enrichment. Analyzing samples with a low enrichment may lead to inconsistent conclusions, as the residual bacterial contamination might misdirect the computational analysis. To better quantify the extent of non-viral contamination in VLP sequencing, we designed ViromeQC, a novel open-source tool able to assess and rank viromes by their viral purity directly from the raw reads. In ViromeQC, rRNA genes and bacterial single-copy proteins are used as a proxy to estimate non-viral contamination. With the ViromeQC, we conducted the largest meta-analysis on the degree of enrichment of thousands of viral metagenomes, and concluded that the vast majority of them are three-fold less enriched than a standard metagenome. ViromeQC was then used to select the human gut viromes that had the highest enrichment as a starting point for a novel reference-free pipeline for the discovery of previously uncharacterized viral entities. The approach included metagenomic assembly of the enriched viromes as well as extensive mining of many thousands of assembled metagenomes, and led to a catalog of 162,876 sequences of highly-trusted viral origin. Most of these predicted viral sequences had no match against any known virus in RefSeq even though some of them showed a prevalence in gut metagenomes of up to 70%. Our analyses and publicly available tools and resources are helping to uncover the still hidden virome diversity and improve the support for current and future investigations of the human virome.

Metagenomics-based discovery of unknown bacteriophages In the human microbiome

Zolfo, Moreno
2020

Abstract

Viruses, and particularly bacteriophages, are key players in many microbial ecosystems and can profoundly influence the human microbiome and its impact on human health. While the bacterial and archaeal fraction of the human microbiome can now be profiled at an unprecedented resolution via cultivation-free metagenomics, viral metagenomics is still extremely challenging. The lack of universal viral genetic markers limits the de-novo discovery of viral entities, and the low number of available viral reference genomes from cultivation studies does not cover well the phage diversity in human microbiome samples. Viral-like particle (VLP) purification has been proposed as a set of experimental tools to concentrate viruses in samples prior to sequencing, but it remains unclear how efficient and reproducible such tools are in practice. In this thesis we aim to address some of these challenges and better exploit the potential of viral metagenomics in the context of the human microbiome. First, we performed and studied the performance of VLP procedures on freshwater and sediment samples. We found that bacteria can still be abundant at the end of the filtration process, thus lowering the efficiency of the enrichment. Analyzing samples with a low enrichment may lead to inconsistent conclusions, as the residual bacterial contamination might misdirect the computational analysis. To better quantify the extent of non-viral contamination in VLP sequencing, we designed ViromeQC, a novel open-source tool able to assess and rank viromes by their viral purity directly from the raw reads. In ViromeQC, rRNA genes and bacterial single-copy proteins are used as a proxy to estimate non-viral contamination. With the ViromeQC, we conducted the largest meta-analysis on the degree of enrichment of thousands of viral metagenomes, and concluded that the vast majority of them are three-fold less enriched than a standard metagenome. ViromeQC was then used to select the human gut viromes that had the highest enrichment as a starting point for a novel reference-free pipeline for the discovery of previously uncharacterized viral entities. The approach included metagenomic assembly of the enriched viromes as well as extensive mining of many thousands of assembled metagenomes, and led to a catalog of 162,876 sequences of highly-trusted viral origin. Most of these predicted viral sequences had no match against any known virus in RefSeq even though some of them showed a prevalence in gut metagenomes of up to 70%. Our analyses and publicly available tools and resources are helping to uncover the still hidden virome diversity and improve the support for current and future investigations of the human virome.
13-ott-2020
Inglese
Segata, Nicola
Università degli studi di Trento
Trento
165
File in questo prodotto:
File Dimensione Formato  
phd_unitn_moreno_zolfo.pdf

Open Access dal 01/10/2021

Dimensione 10.57 MB
Formato Adobe PDF
10.57 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/179786
Il codice NBN di questa tesi è URN:NBN:IT:UNITN-179786