Most of the research on ensembles of classifiers tends to demonstrate that Stacking learning scheme (Wolpert (1992), Ting and Witten (1999), Seewald (2002)) can perform comparably to the best of the base classifiers as selected by cross-validation, if not better. It is to be hoped that we can expect that the final classifier produced by Stacking is able to achieve better performances in terms of accuracy than the best level-0 classifier. Otherwise the computational onus created by the complexity of the procedure would not be justified. This has motivated us to investigate empirically the performance of the Stacking technique, also in terms of stability and robustness, solving the problem of the combination of supervised classified methods by using two different approaches: one may be defined as traditional and the other as innovative. To this end, together with the approach that we will define as traditional, and that is inserted into the framework of StackingC (Seewald (2002)), and uses the combination of different base classification methods that are constructed and evaluated via cross-validation, an extension of the Forward Search (Atkinson, Riani and Cerioli, (2004); (2010)) is proposed, so as to have a robust approach to the same problem. Forward Search is a methodological proposal which, apart from allowing anomalous values to be identified, also makes it possible to monitor in an iterative way the effect exerted by each unit on the model and on the quantities of interest in each step of the search. The †œ philosophy†� at the heart of the Forward Search approach is the creation of a dynamic data analysis process, compared to that of a †œstatic†� type, supplied by the traditional approach The research trend described has established the following objectives for this work: ? Evaluation of the base-level and meta-level classifiers in terms of their accuracy when there are modifications in the size of the data set and in the number of times the experiment is repeated. ? Evaluation of the effects caused by the presence of anomalous values in the data set on the performances of the base-level and meta-level classifiers and their comparison using two different approaches: - Traditional (Cross Validation ) - Innovative (Forward Search) ? Evaluation of the results of the simulation studies carried out to establish whether, and to what extent, the combination of classifiers makes it possible to improve performances compared to the use of a single classifier. ? Underlining the influence that single observations may have on each classifier's rule of decision. ? Monitoring the stability of the allocation rule with regard to the different sample sizes. On what we might define as the traditional level, a Stacking scheme is proposed that has some differences compared to the well-known one, both in terms of characteristics that are already present and with regard to the introduction of innovative elements. Therefore, the innovative nature of the proposal is to be found chiefly in the extension of the Forward Search in the approach to the combination of supervised classification methods, which is Stacking scheme, in order to build the whole procedure in a robust way.In both approaches the phases of the building of the Stacking scheme are illustrated and the main empirical results obtained are shown.

Supervised classification methods:from cross validation to forward search approach

2012

Abstract

Most of the research on ensembles of classifiers tends to demonstrate that Stacking learning scheme (Wolpert (1992), Ting and Witten (1999), Seewald (2002)) can perform comparably to the best of the base classifiers as selected by cross-validation, if not better. It is to be hoped that we can expect that the final classifier produced by Stacking is able to achieve better performances in terms of accuracy than the best level-0 classifier. Otherwise the computational onus created by the complexity of the procedure would not be justified. This has motivated us to investigate empirically the performance of the Stacking technique, also in terms of stability and robustness, solving the problem of the combination of supervised classified methods by using two different approaches: one may be defined as traditional and the other as innovative. To this end, together with the approach that we will define as traditional, and that is inserted into the framework of StackingC (Seewald (2002)), and uses the combination of different base classification methods that are constructed and evaluated via cross-validation, an extension of the Forward Search (Atkinson, Riani and Cerioli, (2004); (2010)) is proposed, so as to have a robust approach to the same problem. Forward Search is a methodological proposal which, apart from allowing anomalous values to be identified, also makes it possible to monitor in an iterative way the effect exerted by each unit on the model and on the quantities of interest in each step of the search. The †œ philosophy†� at the heart of the Forward Search approach is the creation of a dynamic data analysis process, compared to that of a †œstatic†� type, supplied by the traditional approach The research trend described has established the following objectives for this work: ? Evaluation of the base-level and meta-level classifiers in terms of their accuracy when there are modifications in the size of the data set and in the number of times the experiment is repeated. ? Evaluation of the effects caused by the presence of anomalous values in the data set on the performances of the base-level and meta-level classifiers and their comparison using two different approaches: - Traditional (Cross Validation ) - Innovative (Forward Search) ? Evaluation of the results of the simulation studies carried out to establish whether, and to what extent, the combination of classifiers makes it possible to improve performances compared to the use of a single classifier. ? Underlining the influence that single observations may have on each classifier's rule of decision. ? Monitoring the stability of the allocation rule with regard to the different sample sizes. On what we might define as the traditional level, a Stacking scheme is proposed that has some differences compared to the well-known one, both in terms of characteristics that are already present and with regard to the introduction of innovative elements. Therefore, the innovative nature of the proposal is to be found chiefly in the extension of the Forward Search in the approach to the combination of supervised classification methods, which is Stacking scheme, in order to build the whole procedure in a robust way.In both approaches the phases of the building of the Stacking scheme are illustrated and the main empirical results obtained are shown.
2012
en
Categorie ISI-CRUI::Scienze economiche e statistiche::Mathematics
robust
Scienze economiche e statistiche
Settori Disciplinari MIUR::Scienze economiche e statistiche::STATISTICA
staking
Università degli Studi Roma Tre
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/273345
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA3-273345