The first part of this thesis presents the motivation for adapting ideas and methods from the theory of the model-based optimal design of experiments in the context of Big Data while guarding against different sources of bias. In particular, the key focus is on the issue of guarding against bias from confounders and how to use the theory of the design of experiment and randomization to remove bias depending on the constraints in the design. Starting with A/B experiments, largely used by major Tech Companies in online marketing, the theory of circuits is introduced and an algebraic methods which gives a wide choice of randomization schemes is presented. Furthermore, a robust exchange algorithm to deal with the problem of outliers in a Big Dataset is proposed. The second part is based on a marine insurance use case sponsored by Swiss Re Corporate Solutions, commercial insurance division of the Swiss Re Group. Several temporal disaggregation methods for dealing with time series collected at different time frequencies are reviewed and applied to real data in order to obtain a curated dataset for predicting future losses.

Model-based Design of Experiments for Large Dataset

PESCE, ELENA
2021

Abstract

The first part of this thesis presents the motivation for adapting ideas and methods from the theory of the model-based optimal design of experiments in the context of Big Data while guarding against different sources of bias. In particular, the key focus is on the issue of guarding against bias from confounders and how to use the theory of the design of experiment and randomization to remove bias depending on the constraints in the design. Starting with A/B experiments, largely used by major Tech Companies in online marketing, the theory of circuits is introduced and an algebraic methods which gives a wide choice of randomization schemes is presented. Furthermore, a robust exchange algorithm to deal with the problem of outliers in a Big Dataset is proposed. The second part is based on a marine insurance use case sponsored by Swiss Re Corporate Solutions, commercial insurance division of the Swiss Re Group. Several temporal disaggregation methods for dealing with time series collected at different time frequencies are reviewed and applied to real data in order to obtain a curated dataset for predicting future losses.
22-ott-2021
Inglese
RICCOMAGNO, EVA
VIGNI, STEFANO
Università degli studi di Genova
File in questo prodotto:
File Dimensione Formato  
phdunige_3777600.pdf

accesso aperto

Dimensione 1.9 MB
Formato Adobe PDF
1.9 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/70078
Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-70078