The work conducted in this PhD thesis, titled "Methods for Machine Learning and Causal Inference in Medicine," explores the integration of advanced computational techniques and epidemiological methods to tackle clinical and epidemiological challenges. The research initially centres on leveraging the predictive capabilities of machine learning (ML) to develop a multivariable prediction model able to discriminate benign from malignant melanocytic lesions in a dermatology study. Using static and dynamic features manually extracted from dermoscopic images, two predictive models based on logistic regression and RandomForest algorithms are built. The models show high discrimination ability, demonstrating that the integration of dynamic and static features outperforms traditional methods considering static features only. Two user-friendly risk calculators based on these models are created to aid clinicians in decision-making. Subsequently, the thesis explores the possibility of using ML to infer causality from findings, highlighting the bridge between prediction and causal understanding. Chapter 3 investigates the socioeconomic position (SEP) as a driver of early-life exposome in Turin children from the NINFEA birth cohort. Clustering and Principal Component Analysis are employed to reduce the exposome dimensionality (42 environmental exposures). Their outputs are then used as outputs in traditional statistical models to assess their association with socioeconomic position, the driver. In Chapter 4, three novel methodologies integrating ML algorithms within doubly-robust estimators are explored for the estimation of a causal effect. The use these novel estimators can mitigate bias due to statistical model misspecification, thanks to the double robustness property of doubly-robust estimators and the flexibility of data-adaptive modelling strategies. Furthermore, these methods offer promise for addressing challenges such as time-to-event models and time-dependent variables in life-course epidemiology research. In Chapter 5, a notable application involves employing the Targeted Maximum Likelihood Estimator (TMLE) to estimate the effect of prenatal paracetamol exposure on wheezing symptoms during infancy. TMLE results are compared with multivariable regression, propensity score regression adjustment and Inverse Probability Weighting. Results show a weak positive association and the use of TMLE increases confidence in believing that the weak positive association is unlikely due to a statistical model misspecification problem. In conclusion, this thesis highlights the significant impact of machine learning (ML) in epidemiology, enhancing disease diagnosis, exposome exploration, causal inference and life course epidemiology.
Methods for Machine Learning and Causal Inference in Medicine
MOCCIA, Chiara
2024
Abstract
The work conducted in this PhD thesis, titled "Methods for Machine Learning and Causal Inference in Medicine," explores the integration of advanced computational techniques and epidemiological methods to tackle clinical and epidemiological challenges. The research initially centres on leveraging the predictive capabilities of machine learning (ML) to develop a multivariable prediction model able to discriminate benign from malignant melanocytic lesions in a dermatology study. Using static and dynamic features manually extracted from dermoscopic images, two predictive models based on logistic regression and RandomForest algorithms are built. The models show high discrimination ability, demonstrating that the integration of dynamic and static features outperforms traditional methods considering static features only. Two user-friendly risk calculators based on these models are created to aid clinicians in decision-making. Subsequently, the thesis explores the possibility of using ML to infer causality from findings, highlighting the bridge between prediction and causal understanding. Chapter 3 investigates the socioeconomic position (SEP) as a driver of early-life exposome in Turin children from the NINFEA birth cohort. Clustering and Principal Component Analysis are employed to reduce the exposome dimensionality (42 environmental exposures). Their outputs are then used as outputs in traditional statistical models to assess their association with socioeconomic position, the driver. In Chapter 4, three novel methodologies integrating ML algorithms within doubly-robust estimators are explored for the estimation of a causal effect. The use these novel estimators can mitigate bias due to statistical model misspecification, thanks to the double robustness property of doubly-robust estimators and the flexibility of data-adaptive modelling strategies. Furthermore, these methods offer promise for addressing challenges such as time-to-event models and time-dependent variables in life-course epidemiology research. In Chapter 5, a notable application involves employing the Targeted Maximum Likelihood Estimator (TMLE) to estimate the effect of prenatal paracetamol exposure on wheezing symptoms during infancy. TMLE results are compared with multivariable regression, propensity score regression adjustment and Inverse Probability Weighting. Results show a weak positive association and the use of TMLE increases confidence in believing that the weak positive association is unlikely due to a statistical model misspecification problem. In conclusion, this thesis highlights the significant impact of machine learning (ML) in epidemiology, enhancing disease diagnosis, exposome exploration, causal inference and life course epidemiology.| File | Dimensione | Formato | |
|---|---|---|---|
|
Moccia_thesis_06052024.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
7.84 MB
Formato
Adobe PDF
|
7.84 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/363379
URN:NBN:IT:UNITO-363379