Between the years 2020 and 2023, COVID-19 posed an unprecedented challenge to healthcare systems worldwide, rapidly evolving into a pandemic that claimed millions of lives. While the disease has now reached an endemic stage, it continues to demand clinical attention, and the prospect of future pandemics remains a concrete threat. Consequently, developing robust data-driven tools to support healthcare and emergency response remains of utmost importance. In this context, Machine Learning (ML) and Artificial Intelligence (AI) have proven to be valuable allies for healthcare professionals, enabling the extraction of meaningful insights from clinical datasets, accelerating workflows, and supporting personalized care in both diagnostic and prognostic settings. This thesis contributes to these ongoing efforts by developing and applying novel ML techniques for the analysis of clinical tabular data, with a particular focus on COVID-19. The research was conducted in collaboration with the pneumology unit of the University Hospital of Cattinara, Trieste, which is part of ASUGI (Azienda Sanitaria Universitaria Giulioano-Isontina), the Public Health Authority for the provinces of Gorizia and Trieste. Three main studies are presented, each addressing a dual objective: 1. to design and validate methods for analyzing tabular clinical datasets, thereby providing ML practitioners with new methodological tools; and 2. to apply these methods to a real-world COVID-19 dataset to derive actionable insights for clinical decision-making. The first study presents a comprehensive ML pipeline to predict in-hospital mortality of patients with severe COVID-19 pneumonia treated with glucocorticoids. Six supervised algorithms, ranging from logistic regression and decision trees to ensemble methods and neural networks, were trained and evaluated. The best models achieved strong predictive performance (AUROC > 0.9), demonstrating that accurate predictions can be obtained even from moderately sized datasets through careful preprocessing, feature selection, and validation. Model explainability was ensured using SHAP values, which provided individualized explanations and confirmed the clinical relevance of factors such as age, comorbidities, C-reactive protein, and improvements in the PaO2/FiO2 ratio. The second study focuses on fairness and bias detection in tabular datasets, presenting an improved version of FanFAIR, a hybrid statistical-ML tool to quantify bias present in the data. The updated tool supports a wider range of data types, integrates with pandas dataframes, and enables the specification of sensitive attributes. Application of FanFAIR to our COVID-19 dataset revealed that suitable pre-processing can simultaneously enhance model accuracy and fairness, highlighting the importance of methodological and ethical rigor in healthcare AI. Finally, the third study introduces a novel framework combining spectral clustering with landmark survival analysis to identify latent patient subgroups characterized by distinct survival behaviors. Applied to our COVID-19 dataset, the method uncovered clinically meaningful clusters corresponding to high- and low-risk patient groups, whose survival trajectories and clinical profiles remained distinct across multiple temporal landmarks. This approach extends the applicability of ML to survival data analysis in heterogeneous datasets of limited size, where conventional or deep learning models may struggle. The results presented in this thesis demonstrate the potential of transparent, ethical, and interpretable ML to support data-driven decision making in healthcare, particularly in the context of future epidemic and pandemic preparedness. Furthermore, the developed techniques form a comprehensive methodological toolkit adaptable to similar ML and AI problems, with potential applications extending beyond healthcare to other scientific and industrial domains.

Between the years 2020 and 2023, COVID-19 posed an unprecedented challenge to healthcare systems worldwide, rapidly evolving into a pandemic that claimed millions of lives. While the disease has now reached an endemic stage, it continues to demand clinical attention, and the prospect of future pandemics remains a concrete threat. Consequently, developing robust data-driven tools to support healthcare and emergency response remains of utmost importance. In this context, Machine Learning (ML) and Artificial Intelligence (AI) have proven to be valuable allies for healthcare professionals, enabling the extraction of meaningful insights from clinical datasets, accelerating workflows, and supporting personalized care in both diagnostic and prognostic settings. This thesis contributes to these ongoing efforts by developing and applying novel ML techniques for the analysis of clinical tabular data, with a particular focus on COVID-19. The research was conducted in collaboration with the pneumology unit of the University Hospital of Cattinara, Trieste, which is part of ASUGI (Azienda Sanitaria Universitaria Giulioano-Isontina), the Public Health Authority for the provinces of Gorizia and Trieste. Three main studies are presented, each addressing a dual objective: 1. to design and validate methods for analyzing tabular clinical datasets, thereby providing ML practitioners with new methodological tools; and 2. to apply these methods to a real-world COVID-19 dataset to derive actionable insights for clinical decision-making. The first study presents a comprehensive ML pipeline to predict in-hospital mortality of patients with severe COVID-19 pneumonia treated with glucocorticoids. Six supervised algorithms, ranging from logistic regression and decision trees to ensemble methods and neural networks, were trained and evaluated. The best models achieved strong predictive performance (AUROC > 0.9), demonstrating that accurate predictions can be obtained even from moderately sized datasets through careful preprocessing, feature selection, and validation. Model explainability was ensured using SHAP values, which provided individualized explanations and confirmed the clinical relevance of factors such as age, comorbidities, C-reactive protein, and improvements in the PaO2/FiO2 ratio. The second study focuses on fairness and bias detection in tabular datasets, presenting an improved version of FanFAIR, a hybrid statistical-ML tool to quantify bias present in the data. The updated tool supports a wider range of data types, integrates with pandas dataframes, and enables the specification of sensitive attributes. Application of FanFAIR to our COVID-19 dataset revealed that suitable pre-processing can simultaneously enhance model accuracy and fairness, highlighting the importance of methodological and ethical rigor in healthcare AI. Finally, the third study introduces a novel framework combining spectral clustering with landmark survival analysis to identify latent patient subgroups characterized by distinct survival behaviors. Applied to our COVID-19 dataset, the method uncovered clinically meaningful clusters corresponding to high- and low-risk patient groups, whose survival trajectories and clinical profiles remained distinct across multiple temporal landmarks. This approach extends the applicability of ML to survival data analysis in heterogeneous datasets of limited size, where conventional or deep learning models may struggle. The results presented in this thesis demonstrate the potential of transparent, ethical, and interpretable ML to support data-driven decision making in healthcare, particularly in the context of future epidemic and pandemic preparedness. Furthermore, the developed techniques form a comprehensive methodological toolkit adaptable to similar ML and AI problems, with potential applications extending beyond healthcare to other scientific and industrial domains.

Machine Learning Methods for Clinical Decision Support: an analysis based on COVID-19 Data

RISPOLI, MICHELE
2026

Abstract

Between the years 2020 and 2023, COVID-19 posed an unprecedented challenge to healthcare systems worldwide, rapidly evolving into a pandemic that claimed millions of lives. While the disease has now reached an endemic stage, it continues to demand clinical attention, and the prospect of future pandemics remains a concrete threat. Consequently, developing robust data-driven tools to support healthcare and emergency response remains of utmost importance. In this context, Machine Learning (ML) and Artificial Intelligence (AI) have proven to be valuable allies for healthcare professionals, enabling the extraction of meaningful insights from clinical datasets, accelerating workflows, and supporting personalized care in both diagnostic and prognostic settings. This thesis contributes to these ongoing efforts by developing and applying novel ML techniques for the analysis of clinical tabular data, with a particular focus on COVID-19. The research was conducted in collaboration with the pneumology unit of the University Hospital of Cattinara, Trieste, which is part of ASUGI (Azienda Sanitaria Universitaria Giulioano-Isontina), the Public Health Authority for the provinces of Gorizia and Trieste. Three main studies are presented, each addressing a dual objective: 1. to design and validate methods for analyzing tabular clinical datasets, thereby providing ML practitioners with new methodological tools; and 2. to apply these methods to a real-world COVID-19 dataset to derive actionable insights for clinical decision-making. The first study presents a comprehensive ML pipeline to predict in-hospital mortality of patients with severe COVID-19 pneumonia treated with glucocorticoids. Six supervised algorithms, ranging from logistic regression and decision trees to ensemble methods and neural networks, were trained and evaluated. The best models achieved strong predictive performance (AUROC > 0.9), demonstrating that accurate predictions can be obtained even from moderately sized datasets through careful preprocessing, feature selection, and validation. Model explainability was ensured using SHAP values, which provided individualized explanations and confirmed the clinical relevance of factors such as age, comorbidities, C-reactive protein, and improvements in the PaO2/FiO2 ratio. The second study focuses on fairness and bias detection in tabular datasets, presenting an improved version of FanFAIR, a hybrid statistical-ML tool to quantify bias present in the data. The updated tool supports a wider range of data types, integrates with pandas dataframes, and enables the specification of sensitive attributes. Application of FanFAIR to our COVID-19 dataset revealed that suitable pre-processing can simultaneously enhance model accuracy and fairness, highlighting the importance of methodological and ethical rigor in healthcare AI. Finally, the third study introduces a novel framework combining spectral clustering with landmark survival analysis to identify latent patient subgroups characterized by distinct survival behaviors. Applied to our COVID-19 dataset, the method uncovered clinically meaningful clusters corresponding to high- and low-risk patient groups, whose survival trajectories and clinical profiles remained distinct across multiple temporal landmarks. This approach extends the applicability of ML to survival data analysis in heterogeneous datasets of limited size, where conventional or deep learning models may struggle. The results presented in this thesis demonstrate the potential of transparent, ethical, and interpretable ML to support data-driven decision making in healthcare, particularly in the context of future epidemic and pandemic preparedness. Furthermore, the developed techniques form a comprehensive methodological toolkit adaptable to similar ML and AI problems, with potential applications extending beyond healthcare to other scientific and industrial domains.
28-gen-2026
Inglese
Between the years 2020 and 2023, COVID-19 posed an unprecedented challenge to healthcare systems worldwide, rapidly evolving into a pandemic that claimed millions of lives. While the disease has now reached an endemic stage, it continues to demand clinical attention, and the prospect of future pandemics remains a concrete threat. Consequently, developing robust data-driven tools to support healthcare and emergency response remains of utmost importance. In this context, Machine Learning (ML) and Artificial Intelligence (AI) have proven to be valuable allies for healthcare professionals, enabling the extraction of meaningful insights from clinical datasets, accelerating workflows, and supporting personalized care in both diagnostic and prognostic settings. This thesis contributes to these ongoing efforts by developing and applying novel ML techniques for the analysis of clinical tabular data, with a particular focus on COVID-19. The research was conducted in collaboration with the pneumology unit of the University Hospital of Cattinara, Trieste, which is part of ASUGI (Azienda Sanitaria Universitaria Giulioano-Isontina), the Public Health Authority for the provinces of Gorizia and Trieste. Three main studies are presented, each addressing a dual objective: 1. to design and validate methods for analyzing tabular clinical datasets, thereby providing ML practitioners with new methodological tools; and 2. to apply these methods to a real-world COVID-19 dataset to derive actionable insights for clinical decision-making. The first study presents a comprehensive ML pipeline to predict in-hospital mortality of patients with severe COVID-19 pneumonia treated with glucocorticoids. Six supervised algorithms, ranging from logistic regression and decision trees to ensemble methods and neural networks, were trained and evaluated. The best models achieved strong predictive performance (AUROC > 0.9), demonstrating that accurate predictions can be obtained even from moderately sized datasets through careful preprocessing, feature selection, and validation. Model explainability was ensured using SHAP values, which provided individualized explanations and confirmed the clinical relevance of factors such as age, comorbidities, C-reactive protein, and improvements in the PaO2/FiO2 ratio. The second study focuses on fairness and bias detection in tabular datasets, presenting an improved version of FanFAIR, a hybrid statistical-ML tool to quantify bias present in the data. The updated tool supports a wider range of data types, integrates with pandas dataframes, and enables the specification of sensitive attributes. Application of FanFAIR to our COVID-19 dataset revealed that suitable pre-processing can simultaneously enhance model accuracy and fairness, highlighting the importance of methodological and ethical rigor in healthcare AI. Finally, the third study introduces a novel framework combining spectral clustering with landmark survival analysis to identify latent patient subgroups characterized by distinct survival behaviors. Applied to our COVID-19 dataset, the method uncovered clinically meaningful clusters corresponding to high- and low-risk patient groups, whose survival trajectories and clinical profiles remained distinct across multiple temporal landmarks. This approach extends the applicability of ML to survival data analysis in heterogeneous datasets of limited size, where conventional or deep learning models may struggle. The results presented in this thesis demonstrate the potential of transparent, ethical, and interpretable ML to support data-driven decision making in healthcare, particularly in the context of future epidemic and pandemic preparedness. Furthermore, the developed techniques form a comprehensive methodological toolkit adaptable to similar ML and AI problems, with potential applications extending beyond healthcare to other scientific and industrial domains.
Machine Learning; Clinical Data; COVID-19; Decision Support; Data Fairness
D'ONOFRIO, ALBERTO
Manzoni, Luca
Università degli Studi di Trieste
File in questo prodotto:
File Dimensione Formato  
PhDThesis_MicheleRispoli_v2.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 2.7 MB
Formato Adobe PDF
2.7 MB Adobe PDF Visualizza/Apri
PhDThesis_MicheleRispoli_v2_1.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 2.7 MB
Formato Adobe PDF
2.7 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/357313
Il codice NBN di questa tesi è URN:NBN:IT:UNITS-357313