Machine Learning Methods for Clinical Decision Support: an analysis based on COVID-19 Data

Rispoli, Michele

Between the years 2020 and 2023, COVID-19 posed an unprecedented challenge to healthcare systems worldwide, rapidly evolving into a pandemic that claimed millions of lives. While the disease has now reached an endemic stage, it continues to demand clinical attention, and the prospect of future pandemics remains a concrete threat. Consequently, developing robust data-driven tools to support healthcare and emergency response remains of utmost importance. In this context, Machine Learning (ML) and Artificial Intelligence (AI) have proven to be valuable allies for healthcare professionals, enabling the extraction of meaningful insights from clinical datasets, accelerating workflows, and supporting personalized care in both diagnostic and prognostic settings. This thesis contributes to these ongoing efforts by developing and applying novel ML techniques for the analysis of clinical tabular data, with a particular focus on COVID-19. The research was conducted in collaboration with the pneumology unit of the University Hospital of Cattinara, Trieste, which is part of ASUGI (Azienda Sanitaria Universitaria Giulioano-Isontina), the Public Health Authority for the provinces of Gorizia and Trieste. Three main studies are presented, each addressing a dual objective: 1. to design and validate methods for analyzing tabular clinical datasets, thereby providing ML practitioners with new methodological tools; and 2. to apply these methods to a real-world COVID-19 dataset to derive actionable insights for clinical decision-making. The first study presents a comprehensive ML pipeline to predict in-hospital mortality of patients with severe COVID-19 pneumonia treated with glucocorticoids. Six supervised algorithms, ranging from logistic regression and decision trees to ensemble methods and neural networks, were trained and evaluated. The best models achieved strong predictive performance (AUROC > 0.9), demonstrating that accurate predictions can be obtained even from moderately sized datasets through careful preprocessing, feature selection, and validation. Model explainability was ensured using SHAP values, which provided individualized explanations and confirmed the clinical relevance of factors such as age, comorbidities, C-reactive protein, and improvements in the PaO2/FiO2 ratio. The second study focuses on fairness and bias detection in tabular datasets, presenting an improved version of FanFAIR, a hybrid statistical-ML tool to quantify bias present in the data. The updated tool supports a wider range of data types, integrates with pandas dataframes, and enables the specification of sensitive attributes. Application of FanFAIR to our COVID-19 dataset revealed that suitable pre-processing can simultaneously enhance model accuracy and fairness, highlighting the importance of methodological and ethical rigor in healthcare AI. Finally, the third study introduces a novel framework combining spectral clustering with landmark survival analysis to identify latent patient subgroups characterized by distinct survival behaviors. Applied to our COVID-19 dataset, the method uncovered clinically meaningful clusters corresponding to high- and low-risk patient groups, whose survival trajectories and clinical profiles remained distinct across multiple temporal landmarks. This approach extends the applicability of ML to survival data analysis in heterogeneous datasets of limited size, where conventional or deep learning models may struggle. The results presented in this thesis demonstrate the potential of transparent, ethical, and interpretable ML to support data-driven decision making in healthcare, particularly in the context of future epidemic and pandemic preparedness. Furthermore, the developed techniques form a comprehensive methodological toolkit adaptable to similar ML and AI problems, with potential applications extending beyond healthcare to other scientific and industrial domains.

Machine Learning Methods for Clinical Decision Support: an analysis based on COVID-19 Data

RISPOLI, MICHELE

2026

Abstract

Between the years 2020 and 2023, COVID-19 posed an unprecedented challenge to healthcare systems worldwide, rapidly evolving into a pandemic that claimed millions of lives. While the disease has now reached an endemic stage, it continues to demand clinical attention, and the prospect of future pandemics remains a concrete threat. Consequently, developing robust data-driven tools to support healthcare and emergency response remains of utmost importance. In this context, Machine Learning (ML) and Artificial Intelligence (AI) have proven to be valuable allies for healthcare professionals, enabling the extraction of meaningful insights from clinical datasets, accelerating workflows, and supporting personalized care in both diagnostic and prognostic settings. This thesis contributes to these ongoing efforts by developing and applying novel ML techniques for the analysis of clinical tabular data, with a particular focus on COVID-19. The research was conducted in collaboration with the pneumology unit of the University Hospital of Cattinara, Trieste, which is part of ASUGI (Azienda Sanitaria Universitaria Giulioano-Isontina), the Public Health Authority for the provinces of Gorizia and Trieste. Three main studies are presented, each addressing a dual objective: 1. to design and validate methods for analyzing tabular clinical datasets, thereby providing ML practitioners with new methodological tools; and 2. to apply these methods to a real-world COVID-19 dataset to derive actionable insights for clinical decision-making. The first study presents a comprehensive ML pipeline to predict in-hospital mortality of patients with severe COVID-19 pneumonia treated with glucocorticoids. Six supervised algorithms, ranging from logistic regression and decision trees to ensemble methods and neural networks, were trained and evaluated. The best models achieved strong predictive performance (AUROC > 0.9), demonstrating that accurate predictions can be obtained even from moderately sized datasets through careful preprocessing, feature selection, and validation. Model explainability was ensured using SHAP values, which provided individualized explanations and confirmed the clinical relevance of factors such as age, comorbidities, C-reactive protein, and improvements in the PaO2/FiO2 ratio. The second study focuses on fairness and bias detection in tabular datasets, presenting an improved version of FanFAIR, a hybrid statistical-ML tool to quantify bias present in the data. The updated tool supports a wider range of data types, integrates with pandas dataframes, and enables the specification of sensitive attributes. Application of FanFAIR to our COVID-19 dataset revealed that suitable pre-processing can simultaneously enhance model accuracy and fairness, highlighting the importance of methodological and ethical rigor in healthcare AI. Finally, the third study introduces a novel framework combining spectral clustering with landmark survival analysis to identify latent patient subgroups characterized by distinct survival behaviors. Applied to our COVID-19 dataset, the method uncovered clinically meaningful clusters corresponding to high- and low-risk patient groups, whose survival trajectories and clinical profiles remained distinct across multiple temporal landmarks. This approach extends the applicability of ML to survival data analysis in heterogeneous datasets of limited size, where conventional or deep learning models may struggle. The results presented in this thesis demonstrate the potential of transparent, ethical, and interpretable ML to support data-driven decision making in healthcare, particularly in the context of future epidemic and pandemic preparedness. Furthermore, the developed techniques form a comprehensive methodological toolkit adaptable to similar ML and AI problems, with potential applications extending beyond healthcare to other scientific and industrial domains.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				APPLIED DATA SCIENCE AND ARTIFICIAL INTELLIGENCE
			
	Data di pubblicazione
	
				28-gen-2026
			
	Lingua
	
				Inglese
			
	Abstract in italiano
	
				Between the years 2020 and 2023, COVID-19 posed an unprecedented challenge to healthcare systems worldwide, rapidly evolving into a pandemic that claimed millions of lives.
While the disease has now reached an endemic stage, it continues to demand clinical attention, and the prospect of future pandemics remains a concrete threat.
Consequently, developing robust data-driven tools to support healthcare and emergency response remains of utmost importance.

In this context, Machine Learning (ML) and Artificial Intelligence (AI) have proven to be valuable allies for healthcare professionals, enabling the extraction of meaningful insights from clinical datasets, accelerating workflows, and supporting personalized care in both diagnostic and prognostic settings.

This thesis contributes to these ongoing efforts by developing and applying novel ML techniques for the analysis of clinical tabular data, with a particular focus on COVID-19.
The research was conducted in collaboration with the pneumology unit of the University Hospital of Cattinara, Trieste, which is part of ASUGI (Azienda Sanitaria Universitaria Giulioano-Isontina), the Public Health Authority for the provinces of Gorizia and Trieste.

Three main studies are presented, each addressing a dual objective:
1. to design and validate methods for analyzing tabular clinical datasets, thereby providing ML practitioners with new methodological tools; and
2. to apply these methods to a real-world COVID-19 dataset to derive actionable insights for clinical decision-making.

The first study presents a comprehensive ML pipeline to predict in-hospital mortality of patients with severe COVID-19 pneumonia treated with glucocorticoids.
Six supervised algorithms, ranging from logistic regression and decision trees to ensemble methods and neural networks, were trained and evaluated.
The best models achieved strong predictive performance (AUROC > 0.9), demonstrating that accurate predictions can be obtained even from moderately sized datasets through careful preprocessing, feature selection, and validation.
Model explainability was ensured using SHAP values, which provided individualized explanations and confirmed the clinical relevance of factors such as age, comorbidities, C-reactive protein, and improvements in the PaO2/FiO2 ratio.

The second study focuses on fairness and bias detection in tabular datasets, presenting an improved version of FanFAIR, a hybrid statistical-ML tool to quantify bias present in the data.
The updated tool supports a wider range of data types, integrates with pandas dataframes, and enables the specification of sensitive attributes.
Application of FanFAIR to our COVID-19 dataset revealed that suitable pre-processing can simultaneously enhance model accuracy and fairness, highlighting the importance of methodological and ethical rigor in healthcare AI.

Finally, the third study introduces a novel framework combining spectral clustering with landmark survival analysis to identify latent patient subgroups characterized by distinct survival behaviors.
Applied to our COVID-19 dataset, the method uncovered clinically meaningful clusters corresponding to high- and low-risk patient groups, whose survival trajectories and clinical profiles remained distinct across multiple temporal landmarks.
This approach extends the applicability of ML to survival data analysis in heterogeneous datasets of limited size, where conventional or deep learning models may struggle.

The results presented in this thesis demonstrate the potential of transparent, ethical, and interpretable ML to support data-driven decision making in healthcare, particularly in the context of future epidemic and pandemic preparedness.
Furthermore, the developed techniques form a comprehensive methodological toolkit adaptable to similar ML and AI problems, with potential applications extending beyond healthcare to other scientific and industrial domains.
			
	Parola chiave
	
				Machine Learning; Clinical Data; COVID-19; Decision Support; Data Fairness
			
	Relatore, Supervisor, Advisor o Tutor
	
				D'ONOFRIO, ALBERTO
Manzoni, Luca
			
	Nome Editore
	
				Università degli Studi di Trieste
			
	Collezione di appartenenza
	
				Università degli Studi di Trieste

File in questo prodotto:

File	Dimensione	Formato
PhDThesis_MicheleRispoli_v2.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 2.7 MB Formato Adobe PDF Visualizza/Apri	2.7 MB	Adobe PDF	Visualizza/Apri
PhDThesis_MicheleRispoli_v2_1.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 2.7 MB Formato Adobe PDF Visualizza/Apri	2.7 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/357313

Il codice NBN di questa tesi è URN:NBN:IT:UNITS-357313