Optimization of library preparation methods to improve RNA-Sequencing analysis

Avanzato, Carla Giuseppina

Advent of New Generation Sequencing Technology has revolutionized transcriptomic studies by allowing RNA analysis through cDNA sequencing at massive scale (RNA-Sequencing), supplanting previously microarrays and Sanger sequencing based approaches. After the design of experiment and RNA isolation, the first steps in RNA-Seq workflow are RNA Selection, that is done to remove rRNAs that are the most abundant and less useful components of transcriptome, and cDNA library production. Standard libraries do not discriminate from which DNA strand a transcript is encoded, but methods to produce directional libraries were developed. Directional libraries have the ability to preserve strandness information. Anyway RNA Selection and Library preparation methods depend on the final goal of experiment and should be carefully established before the starting. The aim of this work is to optimize protocols for RNA-Seq library, in order to find the more suitable strategy for each research project. A strategy that on one hand gives the required biological answer and, on the other hand, allows to use available resources in terms of time and money. Firstly the protocol of Illumina TruSeq RNA kit is optimized in order to produce libraries in less time and more efficiently manner. The adjustment focused on the following steps: time of fragmentation, number of PCR cycles, time of Ampure XP beads (used to purify) incubation and also, for finished libraries that will be sequenced in Paired – Ends, size selection step is added. Then Paired – Ends (PE) and Single - Ends (SE) reads, produced for differential gene expression study, were compared, because a good study of differential expression requires at least 3 replicates for each condition and the costs could be high, especially when PE sequencing is done. Even if PE sequencing allows a more accurate transcriptome reconstruction and should be compulsory for species that do not have reference genome or transcriptome and require de novo assembly, our comparison shows that PE and SE data, from the same samples, produce an equal number of mapping fragments and have highly correlated expression level. These considerations demonstrate that production of SE data can be sufficient and more cost effective when the aim of the project is to quantify gene expression. In the second part of this work superiority of directional libraries was demonstrated, because they allow to resolve overlapping genes. Then directional libraries were used for two genome annotation projects. For Eggplant (Solanum melongena) genome annotation, 20 separated directional libraries were produced and sequenced in PE; while for Nebbiolo (Vitis vinifera cultivar) genome annotation, budget was limited and an alternative strategy was find: only one directional library from a pool of 28 different RNA samples was produced; finished library was normalized by Duplex Specific Nuclease (DSN) treatment in order to reduce the signal from more expressed transcripts and allow to characterize also the less represented and tissue specific ones; finally normalized library was sequenced in PE. In both cases a good set of transcripts to use for genome annotation, was produced. DSN-method implemented for the Nebbiolo annotation allows to perform the entire experiment by producing one single normalized library, thus making this approach certainly faster and economically convenient. However data from normalized library cannot be used for applications other than annotation itself. While in the analysis of separated samples, as used for eggplant, identity of each one is maintained and generated data can be exploited for further application, as for example to identify expression level of each tissues. In conclusion in RNA-Seq field there is not an unique ideal method, but the choice of workflow depends on the final goal of the project and the available resources. Our role is to know and understand technology, in order to use it in more efficient manner in each different situation.

Optimization of library preparation methods to improve RNA-Sequencing analysis

Avanzato, Carla Giuseppina

2016

Abstract

Advent of New Generation Sequencing Technology has revolutionized transcriptomic studies by allowing RNA analysis through cDNA sequencing at massive scale (RNA-Sequencing), supplanting previously microarrays and Sanger sequencing based approaches. After the design of experiment and RNA isolation, the first steps in RNA-Seq workflow are RNA Selection, that is done to remove rRNAs that are the most abundant and less useful components of transcriptome, and cDNA library production. Standard libraries do not discriminate from which DNA strand a transcript is encoded, but methods to produce directional libraries were developed. Directional libraries have the ability to preserve strandness information. Anyway RNA Selection and Library preparation methods depend on the final goal of experiment and should be carefully established before the starting. The aim of this work is to optimize protocols for RNA-Seq library, in order to find the more suitable strategy for each research project. A strategy that on one hand gives the required biological answer and, on the other hand, allows to use available resources in terms of time and money. Firstly the protocol of Illumina TruSeq RNA kit is optimized in order to produce libraries in less time and more efficiently manner. The adjustment focused on the following steps: time of fragmentation, number of PCR cycles, time of Ampure XP beads (used to purify) incubation and also, for finished libraries that will be sequenced in Paired – Ends, size selection step is added. Then Paired – Ends (PE) and Single - Ends (SE) reads, produced for differential gene expression study, were compared, because a good study of differential expression requires at least 3 replicates for each condition and the costs could be high, especially when PE sequencing is done. Even if PE sequencing allows a more accurate transcriptome reconstruction and should be compulsory for species that do not have reference genome or transcriptome and require de novo assembly, our comparison shows that PE and SE data, from the same samples, produce an equal number of mapping fragments and have highly correlated expression level. These considerations demonstrate that production of SE data can be sufficient and more cost effective when the aim of the project is to quantify gene expression. In the second part of this work superiority of directional libraries was demonstrated, because they allow to resolve overlapping genes. Then directional libraries were used for two genome annotation projects. For Eggplant (Solanum melongena) genome annotation, 20 separated directional libraries were produced and sequenced in PE; while for Nebbiolo (Vitis vinifera cultivar) genome annotation, budget was limited and an alternative strategy was find: only one directional library from a pool of 28 different RNA samples was produced; finished library was normalized by Duplex Specific Nuclease (DSN) treatment in order to reduce the signal from more expressed transcripts and allow to characterize also the less represented and tissue specific ones; finally normalized library was sequenced in PE. In both cases a good set of transcripts to use for genome annotation, was produced. DSN-method implemented for the Nebbiolo annotation allows to perform the entire experiment by producing one single normalized library, thus making this approach certainly faster and economically convenient. However data from normalized library cannot be used for applications other than annotation itself. While in the analysis of separated samples, as used for eggplant, identity of each one is maintained and generated data can be exploited for further application, as for example to identify expression level of each tissues. In conclusion in RNA-Seq field there is not an unique ideal method, but the choice of workflow depends on the final goal of the project and the available resources. Our role is to know and understand technology, in order to use it in more efficient manner in each different situation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Biotecnologie applicate
			
	Data di pubblicazione
	
				2016
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				New Generation Sequencing, RNA-Sequencing, gene expression, annotation, library preparation
			
	Relatore, Supervisor, Advisor o Tutor
	
				Delledonne, Massimo
			
	Numero di pagine
	
				78
			
	Collezione di appartenenza
	
				Università degli Studi di Verona

File in questo prodotto:

File	Dimensione	Formato
TesiRevisionata.pdf accesso solo da BNCF e BNCR Licenza: Tutti i diritti riservati Dimensione 2.04 MB Formato Adobe PDF	2.04 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/112981

Il codice NBN di questa tesi è URN:NBN:IT:UNIVR-112981