A COMPREHENSIVE PIPELINE FOR CLASS COMPARISON AND CLASS PREDICTION IN CANCER RESEARCH

Landoni, Elena

Personalized medicine is an emerging field that promises to bring radical changes in healthcare and may be defined as “a medical model using molecular profiling technologies for tailoring the right therapeutic strategy for the right person at the right time, and determine the predisposition to disease at the population level and to deliver timely and stratified prevention”. The sequencing of the human genome together with the development and implementation of new high throughput technologies has provided access to large ‘omics’ (e.g. genomics, proteomics) data, bringing a better understanding of cancer biology and enabling new approaches to diagnosis, drug development, and individualized therapy. ‘Omics’ data have the potential as cancer biomarkers but no consolidated guidelines have been established for discovery analyses. In the context of the EDERA project, funded by the Italian Association for Cancer Research, a structured pipeline was developed with innovative applications of existing bioinformatics methods including: 1) the combination of the results of two statistical tests (t and Anderson-Darling) to detect features with significant fold change or general distributional differences in class comparison; 2) the application of a bootstrap selection procedure together with machine learning techniques to guarantee result generalizability and study the interconnections among the selected features in class prediction. Such a pipeline was successfully applied to plasmatic microRNA, identifying five hemolysis related microRNAs and to Secondary ElectroSpray Ionization-Mass Spectrometry data, in which case eight mass spectrometry signals were found able to discriminate exhaled breath from breast cancer patients from that of healthy individuals.

A COMPREHENSIVE PIPELINE FOR CLASS COMPARISON AND CLASS PREDICTION IN CANCER RESEARCH

LANDONI, ELENA

2015

Abstract

Personalized medicine is an emerging field that promises to bring radical changes in healthcare and may be defined as “a medical model using molecular profiling technologies for tailoring the right therapeutic strategy for the right person at the right time, and determine the predisposition to disease at the population level and to deliver timely and stratified prevention”. The sequencing of the human genome together with the development and implementation of new high throughput technologies has provided access to large ‘omics’ (e.g. genomics, proteomics) data, bringing a better understanding of cancer biology and enabling new approaches to diagnosis, drug development, and individualized therapy. ‘Omics’ data have the potential as cancer biomarkers but no consolidated guidelines have been established for discovery analyses. In the context of the EDERA project, funded by the Italian Association for Cancer Research, a structured pipeline was developed with innovative applications of existing bioinformatics methods including: 1) the combination of the results of two statistical tests (t and Anderson-Darling) to detect features with significant fold change or general distributional differences in class comparison; 2) the application of a bootstrap selection procedure together with machine learning techniques to guarantee result generalizability and study the interconnections among the selected features in class prediction. Such a pipeline was successfully applied to plasmatic microRNA, identifying five hemolysis related microRNAs and to Secondary ElectroSpray Ionization-Mass Spectrometry data, in which case eight mass spectrometry signals were found able to discriminate exhaled breath from breast cancer patients from that of healthy individuals.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI SCIENZE CLINICHE E DI COMUNITA'
			
	Corso di studio
	
				STATISTICA BIOMEDICA
			
	Data di pubblicazione
	
				11-dic-2015
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				cancer; machine learning; feature selection; classifier; non parametric tests
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				DECARLI, ADRIANO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R10091.pdf Open Access dal 10/03/2016 Dimensione 4.44 MB Formato Adobe PDF Visualizza/Apri	4.44 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/113805

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-113805