The aim of this thesis is to construct novel Machine Learning methods to deal with survival data in Survival Analysis. Survival Analysis is considered as a general approach to analyze survival data, and survival data often arises in a variety of applied fields, such as medicine, engineering, economics and so on. Censoring is the main feature of survival data and it implies an information loss. Thus, it is difficult to analyze and model survival data. With the rapid development of Machine Learning, new techniques in Machine Learning have been proposed to tackle survival data that often show better performance compared to traditional statistical methods and tools. In this thesis, we focus on tree and ensemble methods in Machine Learning for Interval- censored data. Interval-censored data is a general type of survival data. Survival tree is a flexible predictive method for survival data because no specific assumptions are required. For interval-censored data, the Generalized Log-Rank Test has good discriminative power when appropriate parameters are chosen. We construct a specialized test statistic for Generalized Log-Rank Tests, and propose a new survival tree with hyper-parameters by combining the test statistic with the Conditional Inference Framework for interval-censored data. The effects of tuning hyper-parameters are discussed. Tuning hyper-parameters al- lows the tree method to become more general and flexible. The new tree method either demonstrates superior performance or remains competitive with the existing tree method for interval-censored data, referred to as ICtree, which is a special case of it. Then, we construct a novel survival ensemble method where the new trees are utilized as base learn- ers, and average prediction weights are applied to estimate the survival function in this ensemble algorithm. An extensive simulation is carried out to assess the predictive performance of newly proposed tree methods and the new ensemble method. Applications of those novel survival trees in the fields of engineering and medicine are also discussed. The tree methods are applied to a heat exchanger life data and a tooth emergence data. Furthermore, Machine Learning techniques and statistical methods have been applied to establish a credit risk model for evaluating the credit risk of small and medium-sized enterprises.
Survival Tree Models and Survival Ensemble Methods in Machine Learning
CHEN, JIA
2024
Abstract
The aim of this thesis is to construct novel Machine Learning methods to deal with survival data in Survival Analysis. Survival Analysis is considered as a general approach to analyze survival data, and survival data often arises in a variety of applied fields, such as medicine, engineering, economics and so on. Censoring is the main feature of survival data and it implies an information loss. Thus, it is difficult to analyze and model survival data. With the rapid development of Machine Learning, new techniques in Machine Learning have been proposed to tackle survival data that often show better performance compared to traditional statistical methods and tools. In this thesis, we focus on tree and ensemble methods in Machine Learning for Interval- censored data. Interval-censored data is a general type of survival data. Survival tree is a flexible predictive method for survival data because no specific assumptions are required. For interval-censored data, the Generalized Log-Rank Test has good discriminative power when appropriate parameters are chosen. We construct a specialized test statistic for Generalized Log-Rank Tests, and propose a new survival tree with hyper-parameters by combining the test statistic with the Conditional Inference Framework for interval-censored data. The effects of tuning hyper-parameters are discussed. Tuning hyper-parameters al- lows the tree method to become more general and flexible. The new tree method either demonstrates superior performance or remains competitive with the existing tree method for interval-censored data, referred to as ICtree, which is a special case of it. Then, we construct a novel survival ensemble method where the new trees are utilized as base learn- ers, and average prediction weights are applied to estimate the survival function in this ensemble algorithm. An extensive simulation is carried out to assess the predictive performance of newly proposed tree methods and the new ensemble method. Applications of those novel survival trees in the fields of engineering and medicine are also discussed. The tree methods are applied to a heat exchanger life data and a tooth emergence data. Furthermore, Machine Learning techniques and statistical methods have been applied to establish a credit risk model for evaluating the credit risk of small and medium-sized enterprises.File | Dimensione | Formato | |
---|---|---|---|
06_24_24 - Chen Jia.pdf
embargo fino al 24/06/2025
Dimensione
4.49 MB
Formato
Adobe PDF
|
4.49 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/210542
URN:NBN:IT:UNICAM-210542