Advent of New Generation Sequencing Technology has revolutionized transcriptomic studies by allowing RNA analysis through cDNA sequencing at massive scale (RNA-Sequencing), supplanting previously microarrays and Sanger sequencing based approaches. After the design of experiment and RNA isolation, the first steps in RNA-Seq workflow are RNA Selection, that is done to remove rRNAs that are the most abundant and less useful components of transcriptome, and cDNA library production. Standard libraries do not discriminate from which DNA strand a transcript is encoded, but methods to produce directional libraries were developed. Directional libraries have the ability to preserve strandness information. Anyway RNA Selection and Library preparation methods depend on the final goal of experiment and should be carefully established before the starting. The aim of this work is to optimize protocols for RNA-Seq library, in order to find the more suitable strategy for each research project. A strategy that on one hand gives the required biological answer and, on the other hand, allows to use available resources in terms of time and money. Firstly the protocol of Illumina TruSeq RNA kit is optimized in order to produce libraries in less time and more efficiently manner. The adjustment focused on the following steps: time of fragmentation, number of PCR cycles, time of Ampure XP beads (used to purify) incubation and also, for finished libraries that will be sequenced in Paired – Ends, size selection step is added. Then Paired – Ends (PE) and Single - Ends (SE) reads, produced for differential gene expression study, were compared, because a good study of differential expression requires at least 3 replicates for each condition and the costs could be high, especially when PE sequencing is done. Even if PE sequencing allows a more accurate transcriptome reconstruction and should be compulsory for species that do not have reference genome or transcriptome and require de novo assembly, our comparison shows that PE and SE data, from the same samples, produce an equal number of mapping fragments and have highly correlated expression level. These considerations demonstrate that production of SE data can be sufficient and more cost effective when the aim of the project is to quantify gene expression. In the second part of this work superiority of directional libraries was demonstrated, because they allow to resolve overlapping genes. Then directional libraries were used for two genome annotation projects. For Eggplant (Solanum melongena) genome annotation, 20 separated directional libraries were produced and sequenced in PE; while for Nebbiolo (Vitis vinifera cultivar) genome annotation, budget was limited and an alternative strategy was find: only one directional library from a pool of 28 different RNA samples was produced; finished library was normalized by Duplex Specific Nuclease (DSN) treatment in order to reduce the signal from more expressed transcripts and allow to characterize also the less represented and tissue specific ones; finally normalized library was sequenced in PE. In both cases a good set of transcripts to use for genome annotation, was produced. DSN-method implemented for the Nebbiolo annotation allows to perform the entire experiment by producing one single normalized library, thus making this approach certainly faster and economically convenient. However data from normalized library cannot be used for applications other than annotation itself. While in the analysis of separated samples, as used for eggplant, identity of each one is maintained and generated data can be exploited for further application, as for example to identify expression level of each tissues. In conclusion in RNA-Seq field there is not an unique ideal method, but the choice of workflow depends on the final goal of the project and the available resources. Our role is to know and understand technology, in order to use it in more efficient manner in each different situation.
Optimization of library preparation methods to improve RNA-Sequencing analysis
Avanzato, Carla Giuseppina
2016
Abstract
Advent of New Generation Sequencing Technology has revolutionized transcriptomic studies by allowing RNA analysis through cDNA sequencing at massive scale (RNA-Sequencing), supplanting previously microarrays and Sanger sequencing based approaches. After the design of experiment and RNA isolation, the first steps in RNA-Seq workflow are RNA Selection, that is done to remove rRNAs that are the most abundant and less useful components of transcriptome, and cDNA library production. Standard libraries do not discriminate from which DNA strand a transcript is encoded, but methods to produce directional libraries were developed. Directional libraries have the ability to preserve strandness information. Anyway RNA Selection and Library preparation methods depend on the final goal of experiment and should be carefully established before the starting. The aim of this work is to optimize protocols for RNA-Seq library, in order to find the more suitable strategy for each research project. A strategy that on one hand gives the required biological answer and, on the other hand, allows to use available resources in terms of time and money. Firstly the protocol of Illumina TruSeq RNA kit is optimized in order to produce libraries in less time and more efficiently manner. The adjustment focused on the following steps: time of fragmentation, number of PCR cycles, time of Ampure XP beads (used to purify) incubation and also, for finished libraries that will be sequenced in Paired – Ends, size selection step is added. Then Paired – Ends (PE) and Single - Ends (SE) reads, produced for differential gene expression study, were compared, because a good study of differential expression requires at least 3 replicates for each condition and the costs could be high, especially when PE sequencing is done. Even if PE sequencing allows a more accurate transcriptome reconstruction and should be compulsory for species that do not have reference genome or transcriptome and require de novo assembly, our comparison shows that PE and SE data, from the same samples, produce an equal number of mapping fragments and have highly correlated expression level. These considerations demonstrate that production of SE data can be sufficient and more cost effective when the aim of the project is to quantify gene expression. In the second part of this work superiority of directional libraries was demonstrated, because they allow to resolve overlapping genes. Then directional libraries were used for two genome annotation projects. For Eggplant (Solanum melongena) genome annotation, 20 separated directional libraries were produced and sequenced in PE; while for Nebbiolo (Vitis vinifera cultivar) genome annotation, budget was limited and an alternative strategy was find: only one directional library from a pool of 28 different RNA samples was produced; finished library was normalized by Duplex Specific Nuclease (DSN) treatment in order to reduce the signal from more expressed transcripts and allow to characterize also the less represented and tissue specific ones; finally normalized library was sequenced in PE. In both cases a good set of transcripts to use for genome annotation, was produced. DSN-method implemented for the Nebbiolo annotation allows to perform the entire experiment by producing one single normalized library, thus making this approach certainly faster and economically convenient. However data from normalized library cannot be used for applications other than annotation itself. While in the analysis of separated samples, as used for eggplant, identity of each one is maintained and generated data can be exploited for further application, as for example to identify expression level of each tissues. In conclusion in RNA-Seq field there is not an unique ideal method, but the choice of workflow depends on the final goal of the project and the available resources. Our role is to know and understand technology, in order to use it in more efficient manner in each different situation.File | Dimensione | Formato | |
---|---|---|---|
TesiRevisionata.pdf
accesso solo da BNCF e BNCR
Dimensione
2.04 MB
Formato
Adobe PDF
|
2.04 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/112981
URN:NBN:IT:UNIVR-112981