Owing to the strict relationship between protein structure and function, the prediction of protein tertiary structure has become one of the most important tasks in recent years. Despite recent advances, building the complete protein tertiary structure is still not a tractable task in most cases; in the absence of a clear homology relationship the problem is often decomposed into smaller sub tasks, including the prediction of the secondary structure. Notwithstanding the large variety of dierent strategies proposed over the years, secondary structure prediction is still an open problem, and few advances in the field have been made in recent times. In this thesis, the problem of secondary structure prediction is firstly analyzed, identifying five different information sources related to the biological essence of the problem, in order be exploited in a learning system. After describing a general software architecture and framework aimed at dealing with the issues related to the engineering and set up of prediction systems applied to real-world problems, dierent techniques based on the encoding and decoding of biological information, together with custom software architectures, are presented. The different proposals are assessed experimentally. The best improvements are consistent with the recent advances in the field (about 1-2% in the last ten years), conforming the validity of the assumption that the correlation sources identified can be further exploited to improve predictions.

Protein secondary structure prediction: novel methods and software architectures

2011

Abstract

Owing to the strict relationship between protein structure and function, the prediction of protein tertiary structure has become one of the most important tasks in recent years. Despite recent advances, building the complete protein tertiary structure is still not a tractable task in most cases; in the absence of a clear homology relationship the problem is often decomposed into smaller sub tasks, including the prediction of the secondary structure. Notwithstanding the large variety of dierent strategies proposed over the years, secondary structure prediction is still an open problem, and few advances in the field have been made in recent times. In this thesis, the problem of secondary structure prediction is firstly analyzed, identifying five different information sources related to the biological essence of the problem, in order be exploited in a learning system. After describing a general software architecture and framework aimed at dealing with the issues related to the engineering and set up of prediction systems applied to real-world problems, dierent techniques based on the encoding and decoding of biological information, together with custom software architectures, are presented. The different proposals are assessed experimentally. The best improvements are consistent with the recent advances in the field (about 1-2% in the last ten years), conforming the validity of the assumption that the correlation sources identified can be further exploited to improve predictions.
2011
it
File in questo prodotto:
File Dimensione Formato  
PhD_Filippo_G_Ledda.pdf

accesso solo da BNCF e BNCR

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati
Dimensione 6.93 MB
Formato Adobe PDF
6.93 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/314172
Il codice NBN di questa tesi è URN:NBN:IT:BNCF-314172