Personalized medicine is an emerging field that promises to bring radical changes in healthcare and may be defined as “a medical model using molecular profiling technologies for tailoring the right therapeutic strategy for the right person at the right time, and determine the predisposition to disease at the population level and to deliver timely and stratified prevention”. The sequencing of the human genome together with the development and implementation of new high throughput technologies has provided access to large ‘omics’ (e.g. genomics, proteomics) data, bringing a better understanding of cancer biology and enabling new approaches to diagnosis, drug development, and individualized therapy. ‘Omics’ data have the potential as cancer biomarkers but no consolidated guidelines have been established for discovery analyses. In the context of the EDERA project, funded by the Italian Association for Cancer Research, a structured pipeline was developed with innovative applications of existing bioinformatics methods including: 1) the combination of the results of two statistical tests (t and Anderson-Darling) to detect features with significant fold change or general distributional differences in class comparison; 2) the application of a bootstrap selection procedure together with machine learning techniques to guarantee result generalizability and study the interconnections among the selected features in class prediction. Such a pipeline was successfully applied to plasmatic microRNA, identifying five hemolysis related microRNAs and to Secondary ElectroSpray Ionization-Mass Spectrometry data, in which case eight mass spectrometry signals were found able to discriminate exhaled breath from breast cancer patients from that of healthy individuals.

A COMPREHENSIVE PIPELINE FOR CLASS COMPARISON AND CLASS PREDICTION IN CANCER RESEARCH

LANDONI, ELENA
2015

Abstract

Personalized medicine is an emerging field that promises to bring radical changes in healthcare and may be defined as “a medical model using molecular profiling technologies for tailoring the right therapeutic strategy for the right person at the right time, and determine the predisposition to disease at the population level and to deliver timely and stratified prevention”. The sequencing of the human genome together with the development and implementation of new high throughput technologies has provided access to large ‘omics’ (e.g. genomics, proteomics) data, bringing a better understanding of cancer biology and enabling new approaches to diagnosis, drug development, and individualized therapy. ‘Omics’ data have the potential as cancer biomarkers but no consolidated guidelines have been established for discovery analyses. In the context of the EDERA project, funded by the Italian Association for Cancer Research, a structured pipeline was developed with innovative applications of existing bioinformatics methods including: 1) the combination of the results of two statistical tests (t and Anderson-Darling) to detect features with significant fold change or general distributional differences in class comparison; 2) the application of a bootstrap selection procedure together with machine learning techniques to guarantee result generalizability and study the interconnections among the selected features in class prediction. Such a pipeline was successfully applied to plasmatic microRNA, identifying five hemolysis related microRNAs and to Secondary ElectroSpray Ionization-Mass Spectrometry data, in which case eight mass spectrometry signals were found able to discriminate exhaled breath from breast cancer patients from that of healthy individuals.
11-dic-2015
Inglese
cancer; machine learning; feature selection; classifier; non parametric tests
DECARLI, ADRIANO
Università degli Studi di Milano
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R10091.pdf

Open Access dal 10/03/2016

Dimensione 4.44 MB
Formato Adobe PDF
4.44 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/113805
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-113805