The work conducted in this PhD thesis, titled "Methods for Machine Learning and Causal Inference in Medicine," explores the integration of advanced computational techniques and epidemiological methods to tackle clinical and epidemiological challenges. The research initially centres on leveraging the predictive capabilities of machine learning (ML) to develop a multivariable prediction model able to discriminate benign from malignant melanocytic lesions in a dermatology study. Using static and dynamic features manually extracted from dermoscopic images, two predictive models based on logistic regression and RandomForest algorithms are built. The models show high discrimination ability, demonstrating that the integration of dynamic and static features outperforms traditional methods considering static features only. Two user-friendly risk calculators based on these models are created to aid clinicians in decision-making. Subsequently, the thesis explores the possibility of using ML to infer causality from findings, highlighting the bridge between prediction and causal understanding. Chapter 3 investigates the socioeconomic position (SEP) as a driver of early-life exposome in Turin children from the NINFEA birth cohort. Clustering and Principal Component Analysis are employed to reduce the exposome dimensionality (42 environmental exposures). Their outputs are then used as outputs in traditional statistical models to assess their association with socioeconomic position, the driver. In Chapter 4, three novel methodologies integrating ML algorithms within doubly-robust estimators are explored for the estimation of a causal effect. The use these novel estimators can mitigate bias due to statistical model misspecification, thanks to the double robustness property of doubly-robust estimators and the flexibility of data-adaptive modelling strategies. Furthermore, these methods offer promise for addressing challenges such as time-to-event models and time-dependent variables in life-course epidemiology research. In Chapter 5, a notable application involves employing the Targeted Maximum Likelihood Estimator (TMLE) to estimate the effect of prenatal paracetamol exposure on wheezing symptoms during infancy. TMLE results are compared with multivariable regression, propensity score regression adjustment and Inverse Probability Weighting. Results show a weak positive association and the use of TMLE increases confidence in believing that the weak positive association is unlikely due to a statistical model misspecification problem. In conclusion, this thesis highlights the significant impact of machine learning (ML) in epidemiology, enhancing disease diagnosis, exposome exploration, causal inference and life course epidemiology.

Methods for Machine Learning and Causal Inference in Medicine

MOCCIA, Chiara
2024

Abstract

The work conducted in this PhD thesis, titled "Methods for Machine Learning and Causal Inference in Medicine," explores the integration of advanced computational techniques and epidemiological methods to tackle clinical and epidemiological challenges. The research initially centres on leveraging the predictive capabilities of machine learning (ML) to develop a multivariable prediction model able to discriminate benign from malignant melanocytic lesions in a dermatology study. Using static and dynamic features manually extracted from dermoscopic images, two predictive models based on logistic regression and RandomForest algorithms are built. The models show high discrimination ability, demonstrating that the integration of dynamic and static features outperforms traditional methods considering static features only. Two user-friendly risk calculators based on these models are created to aid clinicians in decision-making. Subsequently, the thesis explores the possibility of using ML to infer causality from findings, highlighting the bridge between prediction and causal understanding. Chapter 3 investigates the socioeconomic position (SEP) as a driver of early-life exposome in Turin children from the NINFEA birth cohort. Clustering and Principal Component Analysis are employed to reduce the exposome dimensionality (42 environmental exposures). Their outputs are then used as outputs in traditional statistical models to assess their association with socioeconomic position, the driver. In Chapter 4, three novel methodologies integrating ML algorithms within doubly-robust estimators are explored for the estimation of a causal effect. The use these novel estimators can mitigate bias due to statistical model misspecification, thanks to the double robustness property of doubly-robust estimators and the flexibility of data-adaptive modelling strategies. Furthermore, these methods offer promise for addressing challenges such as time-to-event models and time-dependent variables in life-course epidemiology research. In Chapter 5, a notable application involves employing the Targeted Maximum Likelihood Estimator (TMLE) to estimate the effect of prenatal paracetamol exposure on wheezing symptoms during infancy. TMLE results are compared with multivariable regression, propensity score regression adjustment and Inverse Probability Weighting. Results show a weak positive association and the use of TMLE increases confidence in believing that the weak positive association is unlikely due to a statistical model misspecification problem. In conclusion, this thesis highlights the significant impact of machine learning (ML) in epidemiology, enhancing disease diagnosis, exposome exploration, causal inference and life course epidemiology.
13-mag-2024
Inglese
Machine Learning; Causal Inference,; Targeted Learning; Human Exposome; Life course Epidemio
MAULE, Milena Maria
FARISELLI, Piero
Università degli Studi di Torino
File in questo prodotto:
File Dimensione Formato  
Moccia_thesis_06052024.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 7.84 MB
Formato Adobe PDF
7.84 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/363379
Il codice NBN di questa tesi è URN:NBN:IT:UNITO-363379