The basic idea of this thesis is to reconstruct an heterogeneous network depicting lncRNA-protein interactions that would summarize what is currently known, allow the prediction of lacking features and thus give a complete mechanistic understanding of the functions of lncRNAs by the network topological analysis. Unfortunately, this approach raised problems related to different aspects. Firstly, even if recent studies show that a growing number of lncRNAs play critical roles in complex cellular processes and that they are implicated in a wide range of human diseases, the fraction of annotated lncRNAs is still small. Secondly, as of today, most databases are highly inhomogeneous in terms of the type of the provided information, and analytical and experimental approaches to investigate them have been hampered by the lack of comprehensive annotation. Thirdly, the standard bioinformatics solution to fill the gaps due to lacking information is based on machine learning techniques that usually lead to myriad problems related to the preprocessing of data and the input dataset format, both aspects that oftentimes are conducted by trial and error. Finally, a challenging problem that arises in this domain is the data visualization. A common strategy used to overcome the problem is constructing interaction networks, whose analytical but also visual inspection can offer important biological insights, however one primary drawback with this approach is to develop an efficient and scalable algorithm to produce easily interpretable layouts for sparse graphs when the number of nodes is very large. The thesis deals with a multidisciplinary approach to unravel the complexity of lncRNAs regulatory networks and investigate their functions. The objective is to demonstrate the feasibility of using machine learning techniques as well as network analysis to find hidden patterns in the data and to predict new features.

RNA syntax and semantics: investigating the transcriptome complexity

2019

Abstract

The basic idea of this thesis is to reconstruct an heterogeneous network depicting lncRNA-protein interactions that would summarize what is currently known, allow the prediction of lacking features and thus give a complete mechanistic understanding of the functions of lncRNAs by the network topological analysis. Unfortunately, this approach raised problems related to different aspects. Firstly, even if recent studies show that a growing number of lncRNAs play critical roles in complex cellular processes and that they are implicated in a wide range of human diseases, the fraction of annotated lncRNAs is still small. Secondly, as of today, most databases are highly inhomogeneous in terms of the type of the provided information, and analytical and experimental approaches to investigate them have been hampered by the lack of comprehensive annotation. Thirdly, the standard bioinformatics solution to fill the gaps due to lacking information is based on machine learning techniques that usually lead to myriad problems related to the preprocessing of data and the input dataset format, both aspects that oftentimes are conducted by trial and error. Finally, a challenging problem that arises in this domain is the data visualization. A common strategy used to overcome the problem is constructing interaction networks, whose analytical but also visual inspection can offer important biological insights, however one primary drawback with this approach is to develop an efficient and scalable algorithm to produce easily interpretable layouts for sparse graphs when the number of nodes is very large. The thesis deals with a multidisciplinary approach to unravel the complexity of lncRNAs regulatory networks and investigate their functions. The objective is to demonstrate the feasibility of using machine learning techniques as well as network analysis to find hidden patterns in the data and to predict new features.
3-apr-2019
Università degli Studi di Bologna
File in questo prodotto:
File Dimensione Formato  
ireneBonafede_PhDtesis_afterRev_finale.pdf

accesso solo da BNCF e BNCR

Tipologia: Altro materiale allegato
Dimensione 1.97 MB
Formato Adobe PDF
1.97 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/148199
Il codice NBN di questa tesi è URN:NBN:IT:UNIBO-148199