Tree based methods for regression and classification have a long and successful history in statistics and data–analysis and are essentially based on a recursive partition of the covariate space, possibly driven by specific testing procedures design to control branch creation. Starting from the conditional approach introduced in where the choice of the split–variable and the split–value are divided into two different steps allowing an unbiased feature selection, in this work we introduce an energy based testing scheme to validate each of these phases. Energy methods are based on metrics such as distance correlation which, under suitable conditions, ensures the independence of the variables and are therefore more informative than standard association measures. Moreover, as distance correlation measures can be defined for (almost) any kind of variables, our proposed framework is flexible enough to accomodate multiple types of covariates. We focus in particular on the case of functional covariates, for which we show simulated and real data examples, as well as comparisons with more established functional data analysis methods.

Classification and regression energy tree for functional data

BRANDI, MARCO
2018

Abstract

Tree based methods for regression and classification have a long and successful history in statistics and data–analysis and are essentially based on a recursive partition of the covariate space, possibly driven by specific testing procedures design to control branch creation. Starting from the conditional approach introduced in where the choice of the split–variable and the split–value are divided into two different steps allowing an unbiased feature selection, in this work we introduce an energy based testing scheme to validate each of these phases. Energy methods are based on metrics such as distance correlation which, under suitable conditions, ensures the independence of the variables and are therefore more informative than standard association measures. Moreover, as distance correlation measures can be defined for (almost) any kind of variables, our proposed framework is flexible enough to accomodate multiple types of covariates. We focus in particular on the case of functional covariates, for which we show simulated and real data examples, as well as comparisons with more established functional data analysis methods.
13-set-2018
Inglese
functional data analysis; decision tree; energy statistics
BRUTTI, Pierpaolo
CONTI, Pier Luigi
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi dottorato Brandi

accesso aperto

Dimensione 5.11 MB
Formato Unknown
5.11 MB Unknown Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/94415
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-94415