Discovery of Unconventional Patterns for Sequence Analysis: Theory and Algorithms

Battaglia, Giovanni

The biology community is collecting a large amount of raw data, such as the genome sequences of organisms, microarray data, interaction data such as gene-protein interactions, protein-protein interactions, etc. This amount is rapidly increasing and the process of understanding the data is lagging behind the process of acquiring it. An inevitable first step towards making sense of the data is to study their regularities focusing on the non-random structures appearing surprisingly often in the input sequences: patterns. In this thesis we discuss three incarnations of the pattern discovery task, exploring three types of patterns that can model different regularities of the input dataset. While mask patterns have been designed to model short repeated biological sequences, showing a high conservation of their content at some specific positions, permutation patterns have been designed to detect repeated patterns whose parts maintain their physical adjacency but not their ordering in all the pattern occurrences. Transposons, instead, model mobile sequences in the input dataset, which can be discovered by comparing different copies of the same input string, detecting large insertions and deletions in their alignment.

Discovery of Unconventional Patterns for Sequence Analysis: Theory and Algorithms

BATTAGLIA, GIOVANNI

2011

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				19-dic-2011
			
	Lingua
	
				Italiano
			
	Parola chiave
	
				mask patterns
pattern discovery
permutation patterns
transposons
			
	Relatore, Supervisor, Advisor o Tutor
	
				Grossi, Roberto
			
	Collezione di appartenenza
	
				Università degli Studi di Pisa

File in questo prodotto:

File	Dimensione	Formato
giovanni_battaglia_final_phd_thesis.pdf accesso aperto Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati Dimensione 1.49 MB Formato Adobe PDF Visualizza/Apri	1.49 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/128506

Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-128506