Machine Learning and Statistical models are nowadays widely used in different fields of application thanks to their flexibility and adaptation to specific type of data and problem domain. Each part of this thesis presents a domain of application and proposes a specific approach to solve the problem described. A Bayesian framework is applied in the First and Third part of this thesis, while a Machine Learning approach is preferred in the Second part. The first part of this thesis proposes an algorithm for a new ensemble decision tree procedure based on Proper Bayesian bootstrap. The introduction of synthetic data generated from a prior distribution makes the prediction output more stable in terms of variance component of the Mean Square Error, with more evident results in case of low sample size and high dimensional problems. In the second part we describe a methodological proposal to automate business accounting procedures and integrate the prediction output of a Machine Learning classifier into a software for account officers. All the models are tested on real datasets provided by Datev, a company which develops software for account officers. In the third part we apply Bayesian survival analysis to a Covid-19 dataset to analyse specific quantities which describe the evolution of the epidemic. The estimation of epidemiological quantities using statistical models can be very helpful to address the nonidentifiability problem of the compartmental epidemiological models and leads to more stable prediction of trend of the epidemic.

Machine Learning and Statistical models in real world applications

BARDELLI, CHIARA
2021

Abstract

Machine Learning and Statistical models are nowadays widely used in different fields of application thanks to their flexibility and adaptation to specific type of data and problem domain. Each part of this thesis presents a domain of application and proposes a specific approach to solve the problem described. A Bayesian framework is applied in the First and Third part of this thesis, while a Machine Learning approach is preferred in the Second part. The first part of this thesis proposes an algorithm for a new ensemble decision tree procedure based on Proper Bayesian bootstrap. The introduction of synthetic data generated from a prior distribution makes the prediction output more stable in terms of variance component of the Mean Square Error, with more evident results in case of low sample size and high dimensional problems. In the second part we describe a methodological proposal to automate business accounting procedures and integrate the prediction output of a Machine Learning classifier into a software for account officers. All the models are tested on real datasets provided by Datev, a company which develops software for account officers. In the third part we apply Bayesian survival analysis to a Covid-19 dataset to analyse specific quantities which describe the evolution of the epidemic. The estimation of epidemiological quantities using statistical models can be very helpful to address the nonidentifiability problem of the compartmental epidemiological models and leads to more stable prediction of trend of the epidemic.
23-dic-2021
Inglese
FIGINI, SILVIA
Università degli studi di Pavia
File in questo prodotto:
File Dimensione Formato  
PhD_Tesi_Chiara_Bardelli (3).pdf

Open Access dal 24/01/2022

Dimensione 4.28 MB
Formato Adobe PDF
4.28 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/84670
Il codice NBN di questa tesi è URN:NBN:IT:UNIPV-84670