Computer-aided analysis of complex neurological data for age-based classification of upper limbs motor performance and radiomics-based survival prediction of brain tumors

Shaheen, Asma

Nowadays, the availability of an ever-increasing amount of digital medical data collected through heterogeneous sources such as healthcare systems, sensors, and mobile consumer technologies makes it possible to perform computer-aided analyses aimed at improving the knowledge, diagnosis, and treatment of medical conditions. In this thesis, we worked with two medical datasets that can be used to study two different types of neurological disorders, motor control disorders (e.g., Parkinson’s disease) and brain tumors. The first dataset is comprised of the results of digital motor tests of the upper limbs that have been taken by more than 10000 users of a free and publicly available mobile application called MotorBrain. Motor tests are used by neurologists to assess human motor performance and support the diagnosis of disorders affecting motor control. Our first goal was to analyse the MotorBrain data with statistical methods to investigate the age-related behavior patterns of healthy subjects for the different motor tests included in the application. Results show that the collected data reveal the typical degradation of motor performance that is common with aging, thus providing support for the appropriateness of the considered approach to motor performance data collection and potentially helping neurologist to identify neurological disorders at an early stage by comparing new data with the available normative data. At the same time, the results highlight problems that emerge when data collection is performed in an unsupervised non-clinical setting. Based on the results of the statistical analysis, we used machine learning to automatically classify users according to their motor performance. The idea is to use such classification to automatically flag cases whose motor performance differs significantly from the typical performance of their age group and thus require manual inspection from a neurologist. In particular, we used random forest and logistic regression classification techniques with Minimum Redundancy, Maximum Relevance (MRMR) and Recursive Feature Elimination with SVM (RFE-SVM) feature selection methods. For each motor test, we were able to achieve good average accuracy in discriminating motor performance of young and old adults, with the random forest method leading to better results. Similar results were obtained for multi-class discrimination based on 5 age groups. The second dataset we worked with consists of a standard set of MRI images of brain tumors that is often used to develop and validate radiomics-based methods for overall survival (OS) classification of brain gliomas. We specifically focused on two important steps of the radiomics process, segmentation and feature selection. We first used the MRI dataset to empirically evaluate the impact of six different segmentation algorithms – five Convolutional Neural Networks and the STAPLE-fusion method - and four multiregional radiomic models (Whole Tumor (WT), 3-subregions, 6-subregions, and 21-subregions) on OS classification. Results of the evaluation show that the 3-subregions radiomics model has high predictive power but poor robustness while the 6-subregions and 21-subregions radiomics models are more robust but have low predictive power. The poor robustness of the 3-subregions radiomics model was associated with highly variable and inferior segmentation of tumor core and active tumor subregions as quantified by the Hausdorff metric. Failure analysis revealed that the WT radiomics model, the 6-subregions radiomics model, and the 21-subregions radiomics model failed for some subjects, possibly because of inaccurate segmentation of the WT volume. Moreover, short-term survivors were largely misclassified by the radiomic models and were associated to large segmentation errors. The STAPLE fusion method was able to circumvent these segmentation errors but was not found to be the ultimate solution in terms of its predictive power.

Computer-aided analysis of complex neurological data for age-based classification of upper limbs motor performance and radiomics-based survival prediction of brain tumors

SHAHEEN, ASMA

2023

Abstract

Nowadays, the availability of an ever-increasing amount of digital medical data collected through heterogeneous sources such as healthcare systems, sensors, and mobile consumer technologies makes it possible to perform computer-aided analyses aimed at improving the knowledge, diagnosis, and treatment of medical conditions. In this thesis, we worked with two medical datasets that can be used to study two different types of neurological disorders, motor control disorders (e.g., Parkinson’s disease) and brain tumors. The first dataset is comprised of the results of digital motor tests of the upper limbs that have been taken by more than 10000 users of a free and publicly available mobile application called MotorBrain. Motor tests are used by neurologists to assess human motor performance and support the diagnosis of disorders affecting motor control. Our first goal was to analyse the MotorBrain data with statistical methods to investigate the age-related behavior patterns of healthy subjects for the different motor tests included in the application. Results show that the collected data reveal the typical degradation of motor performance that is common with aging, thus providing support for the appropriateness of the considered approach to motor performance data collection and potentially helping neurologist to identify neurological disorders at an early stage by comparing new data with the available normative data. At the same time, the results highlight problems that emerge when data collection is performed in an unsupervised non-clinical setting. Based on the results of the statistical analysis, we used machine learning to automatically classify users according to their motor performance. The idea is to use such classification to automatically flag cases whose motor performance differs significantly from the typical performance of their age group and thus require manual inspection from a neurologist. In particular, we used random forest and logistic regression classification techniques with Minimum Redundancy, Maximum Relevance (MRMR) and Recursive Feature Elimination with SVM (RFE-SVM) feature selection methods. For each motor test, we were able to achieve good average accuracy in discriminating motor performance of young and old adults, with the random forest method leading to better results. Similar results were obtained for multi-class discrimination based on 5 age groups. The second dataset we worked with consists of a standard set of MRI images of brain tumors that is often used to develop and validate radiomics-based methods for overall survival (OS) classification of brain gliomas. We specifically focused on two important steps of the radiomics process, segmentation and feature selection. We first used the MRI dataset to empirically evaluate the impact of six different segmentation algorithms – five Convolutional Neural Networks and the STAPLE-fusion method - and four multiregional radiomic models (Whole Tumor (WT), 3-subregions, 6-subregions, and 21-subregions) on OS classification. Results of the evaluation show that the 3-subregions radiomics model has high predictive power but poor robustness while the 6-subregions and 21-subregions radiomics models are more robust but have low predictive power. The poor robustness of the 3-subregions radiomics model was associated with highly variable and inferior segmentation of tumor core and active tumor subregions as quantified by the Hausdorff metric. Failure analysis revealed that the WT radiomics model, the 6-subregions radiomics model, and the 21-subregions radiomics model failed for some subjects, possibly because of inaccurate segmentation of the WT volume. Moreover, short-term survivors were largely misclassified by the radiomic models and were associated to large segmentation errors. The STAPLE fusion method was able to circumvent these segmentation errors but was not found to be the ultimate solution in terms of its predictive power.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Dottorato di ricerca in Informatica e scienze matematiche e fisiche
			
	Data di pubblicazione
	
				13-feb-2023
			
	Lingua
	
				Inglese
			
	Abstract in italiano
	
				Nowadays, the availability of an ever-increasing amount of digital medical data collected through heterogeneous sources such as healthcare systems, sensors, and mobile consumer technologies makes it possible to perform computer-aided analyses aimed at improving the knowledge, diagnosis, and treatment of medical conditions. In this thesis, we worked with two medical datasets that can be used to study two different types of neurological disorders, motor control disorders (e.g., Parkinson’s disease) and brain tumors.  The first dataset is comprised of the results of digital motor tests of the upper limbs that have been taken by more than 10000 users of a free and publicly available mobile application called MotorBrain. Motor tests are used by neurologists to assess human motor performance and support the diagnosis of disorders affecting motor control. 
Our first goal was to analyse the MotorBrain data with statistical methods to investigate the age-related behavior patterns of healthy subjects for the different motor tests included in the application. Results show that the collected data reveal the typical degradation of motor performance that is common with aging, thus providing support for the appropriateness of the considered approach to motor performance data collection and potentially helping neurologist to identify neurological disorders at an early stage by comparing new data with the available normative data. At the same time, the results highlight problems that emerge when data collection is performed in an unsupervised non-clinical setting. 
Based on the results of the statistical analysis, we used machine learning to automatically classify users according to their motor performance. The idea is to use such classification to automatically flag cases whose motor performance differs significantly from the typical performance of their age group and thus require manual inspection from a neurologist. In particular, we used random forest and logistic regression classification techniques with Minimum Redundancy, Maximum Relevance (MRMR) and Recursive Feature Elimination with SVM (RFE-SVM) feature selection methods. For each motor test, we were able to achieve good average accuracy in discriminating motor performance of young and old adults, with the random forest method leading to better results. Similar results were obtained for multi-class discrimination based on 5 age groups.
The second dataset we worked with consists of a standard set of MRI images of brain tumors that is often used to develop and validate radiomics-based methods for overall survival (OS) classification of brain gliomas. We specifically focused on two important steps of the radiomics process, segmentation and feature selection.
We first used the MRI dataset to empirically evaluate the impact of six different segmentation algorithms – five Convolutional Neural Networks and the STAPLE-fusion method - and four multiregional radiomic models (Whole Tumor (WT), 3-subregions, 6-subregions, and 21-subregions) on OS classification. Results of the evaluation show that the 3-subregions radiomics model has high predictive power but poor robustness while the 6-subregions and 21-subregions radiomics models are more robust but have low predictive power. The poor robustness of the 3-subregions radiomics model was associated with highly variable and inferior segmentation of tumor core and active tumor subregions as quantified by the Hausdorff metric. Failure analysis revealed that the WT radiomics model, the 6-subregions radiomics model, and the 21-subregions radiomics model failed for some subjects, possibly because of inaccurate segmentation of the WT volume. Moreover, short-term survivors were largely misclassified by the radiomic models and were associated to large segmentation errors. The STAPLE fusion method was able to circumvent these segmentation errors but was not found to be the ultimate solution in terms of its predictive power.
			
	Parola chiave
	
				Medical Data; Machine Learning; Radiomics; Motor performance; Statistical Analysis
			
	Relatore, Supervisor, Advisor o Tutor
	
				BURIGAT, Stefano
MARCONE, Alberto Giulio
			
	Nome Editore
	
				Università degli Studi di Udine
			
	Collezione di appartenenza
	
				Università degli Studi di Udine

File in questo prodotto:

File	Dimensione	Formato
PhD_Thesis_asmashaheen.pdf accesso aperto Dimensione 5.77 MB Formato Adobe PDF Visualizza/Apri	5.77 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/91076

Il codice NBN di questa tesi è URN:NBN:IT:UNIUD-91076