Background: Time-to-event analysis is a cornerstone of medical research, yet standard methods often rely on restrictive assumptions that limit their clinical utility and causal interpretability. The Hazard Ratio (HR), while popular, suffers from non-collapsibility and lacks a straightforward causal interpretation, particularly when proportional hazards (PH) assumptions are violated. Furthermore, single-number summaries often obscure the temporal dynamics of risk, failing to communicate when a treatment is most effective. Objective: This thesis aims to evaluate, develop, and extend flexible parametric models to improve the estimation and communication of causal effects in survival analysis. It specifically targets the robust estimation of cumulative quantities (e.g., Risk Difference, Relative Risk) and the distributional analysis of health outcomes beyond simple summary statistics. Methods: The work explores two primary modeling pathways: (1) indirect estimation via flexible hazard-based models (Discrete-Time Hazard and Piece-wise Exponential models) and (2) direct estimation using pseudo-observations. • Simulation Studies: Extensive Monte Carlo simulations are conducted to determine the sample size requirements (Events Per Parameter) for these flexible models and to assess their performance under complex data-generating mechanisms (non-proportional hazards, time-varying effects). • Model Selection: A novel statistical learning workflow is proposed, integrating bootstrap perturbation and time-dependent predictive metrics (e.g., Net Reclassification Improvement) to robustly select model complexity for the Cumulative Incidence Function (CIF) in competing risks settings. • New Estimators: The thesis introduces an imputation-based Piece-wise Exponential model for the direct estimation of the sub-distribution hazard. • New Measures: To improve risk communication, two novel estimands are defined: the Highest Risk Density Region (HRDR) and the Highest Net Risk Difference Region (HNRDR). These measures identify the specific time intervals where absolute risk is most concentrated or where the treatment effect is maximal. Results: Simulation results indicate that flexible parametric models require approximately 20 events per parameter to achieve stable estimates of causal effects on the cumulative scale. The proposed imputation-based PWE model for competing risks demonstrates negligible bias and competitive performance against established methods. The HRDR and HNRDR frameworks successfully characterize the temporal distribution of risk, distinguishing between treatments that delay event onset versus those that reduce overall event magnitude. Finally, the distributional regression framework is successfully generalized to non-censored continuous outcomes, offering a unified approach to dose-response analysis. Conclusion: This thesis provides a comprehensive framework for the rigorous application of flexible survival models. By shifting the analytical focus from conditional rates (hazard ratios) to marginal cumulative effects and full probability distributions, it offers researchers tools to generate more causally sound, robust, and clinically interpretable evidence.
EVALUATION, DEVELOPMENT, AND EXTENSION OF FLEXIBLE PARAMETRIC MODELS FOR THE DISTRIBUTIONAL ANALYSIS OF CENSORED AND NON-CENSORED DATA: A FRAMEWORK FOR MODELING AND COMMUNICATING THE FULL DISTRIBUTION OF HEALTH OUTCOMES.
BIGANZOLI, GIACOMO
2026
Abstract
Background: Time-to-event analysis is a cornerstone of medical research, yet standard methods often rely on restrictive assumptions that limit their clinical utility and causal interpretability. The Hazard Ratio (HR), while popular, suffers from non-collapsibility and lacks a straightforward causal interpretation, particularly when proportional hazards (PH) assumptions are violated. Furthermore, single-number summaries often obscure the temporal dynamics of risk, failing to communicate when a treatment is most effective. Objective: This thesis aims to evaluate, develop, and extend flexible parametric models to improve the estimation and communication of causal effects in survival analysis. It specifically targets the robust estimation of cumulative quantities (e.g., Risk Difference, Relative Risk) and the distributional analysis of health outcomes beyond simple summary statistics. Methods: The work explores two primary modeling pathways: (1) indirect estimation via flexible hazard-based models (Discrete-Time Hazard and Piece-wise Exponential models) and (2) direct estimation using pseudo-observations. • Simulation Studies: Extensive Monte Carlo simulations are conducted to determine the sample size requirements (Events Per Parameter) for these flexible models and to assess their performance under complex data-generating mechanisms (non-proportional hazards, time-varying effects). • Model Selection: A novel statistical learning workflow is proposed, integrating bootstrap perturbation and time-dependent predictive metrics (e.g., Net Reclassification Improvement) to robustly select model complexity for the Cumulative Incidence Function (CIF) in competing risks settings. • New Estimators: The thesis introduces an imputation-based Piece-wise Exponential model for the direct estimation of the sub-distribution hazard. • New Measures: To improve risk communication, two novel estimands are defined: the Highest Risk Density Region (HRDR) and the Highest Net Risk Difference Region (HNRDR). These measures identify the specific time intervals where absolute risk is most concentrated or where the treatment effect is maximal. Results: Simulation results indicate that flexible parametric models require approximately 20 events per parameter to achieve stable estimates of causal effects on the cumulative scale. The proposed imputation-based PWE model for competing risks demonstrates negligible bias and competitive performance against established methods. The HRDR and HNRDR frameworks successfully characterize the temporal distribution of risk, distinguishing between treatments that delay event onset versus those that reduce overall event magnitude. Finally, the distributional regression framework is successfully generalized to non-censored continuous outcomes, offering a unified approach to dose-response analysis. Conclusion: This thesis provides a comprehensive framework for the rigorous application of flexible survival models. By shifting the analytical focus from conditional rates (hazard ratios) to marginal cumulative effects and full probability distributions, it offers researchers tools to generate more causally sound, robust, and clinically interpretable evidence.| File | Dimensione | Formato | |
|---|---|---|---|
|
phd_unimi_R13736.pdf
accesso aperto
Licenza:
Creative Commons
Dimensione
2.5 MB
Formato
Adobe PDF
|
2.5 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/354635
URN:NBN:IT:UNIMI-354635