The semi-parametric modelling approach combines the strengths of parametric and non-parametric methods, offering flexibility and interpretability in analysing complex relationships between variables. Unlike purely parametric models, which assume a specific functional form for the entire model, or non-parametric models, which make no assumptions about the underlying structure, semi-parametric models strike a balance by specifying part of the model parametrically while leaving other components unspecified or modelled flexibly. This thesis aims at exploiting and exploring the flexibility of this class of models in the regression framework by introducing novel methodologies and computational tools.\\ The thesis is organised into three main threads. The first proposes a novel method for integrating automatic feature engineering into Generalised Additive Models for Location, Scale, and Shape. Specifically, the aim is to capture complex relationships between covariates and the response while maintaining the model parsimonious and interpretable through general differentiable scalar-valued functions of covariates. The approach includes computationally efficient methods for estimating the parameters of these transformations and quantifying their uncertainty, along with all other model parameters. The proposed methods are implemented in the \verb|gamFactory| R package and are applied to electricity net-load, house prices in London and wind power generation forecast data. The second thread presents a gradient boosting-based approach to automatically perform effect selection in Additive Models, identifying main effects and first-order interactions while inducing sparsity via a penalised multinomial likelihood. Finally, the nested effects methodology is applied to Accelerated Failure Time models, leveraging monotonic P-splines to estimate the log-cumulative hazard function, thus relaxing the distributional requirement. Additionally, the proposed method allows for modelling log-times using an additive model that incorporates smooth functions estimated via splines, with the resulting model fitted through penalised log-likelihood methods.
Advances in Flexible Statistical Modeling: Feature Engineering, Model Selection, and Survival Analysis
COLLARIN, CLAUDIA
2025
Abstract
The semi-parametric modelling approach combines the strengths of parametric and non-parametric methods, offering flexibility and interpretability in analysing complex relationships between variables. Unlike purely parametric models, which assume a specific functional form for the entire model, or non-parametric models, which make no assumptions about the underlying structure, semi-parametric models strike a balance by specifying part of the model parametrically while leaving other components unspecified or modelled flexibly. This thesis aims at exploiting and exploring the flexibility of this class of models in the regression framework by introducing novel methodologies and computational tools.\\ The thesis is organised into three main threads. The first proposes a novel method for integrating automatic feature engineering into Generalised Additive Models for Location, Scale, and Shape. Specifically, the aim is to capture complex relationships between covariates and the response while maintaining the model parsimonious and interpretable through general differentiable scalar-valued functions of covariates. The approach includes computationally efficient methods for estimating the parameters of these transformations and quantifying their uncertainty, along with all other model parameters. The proposed methods are implemented in the \verb|gamFactory| R package and are applied to electricity net-load, house prices in London and wind power generation forecast data. The second thread presents a gradient boosting-based approach to automatically perform effect selection in Additive Models, identifying main effects and first-order interactions while inducing sparsity via a penalised multinomial likelihood. Finally, the nested effects methodology is applied to Accelerated Failure Time models, leveraging monotonic P-splines to estimate the log-cumulative hazard function, thus relaxing the distributional requirement. Additionally, the proposed method allows for modelling log-times using an additive model that incorporates smooth functions estimated via splines, with the resulting model fitted through penalised log-likelihood methods.File | Dimensione | Formato | |
---|---|---|---|
tesi_definitiva_Claudia_Collarin.pdf
embargo fino al 10/04/2026
Dimensione
5.39 MB
Formato
Adobe PDF
|
5.39 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/207728
URN:NBN:IT:UNIPD-207728