The first part of this thesis presents the motivation for adapting ideas and methods from the theory of the model-based optimal design of experiments in the context of Big Data while guarding against different sources of bias. In particular, the key focus is on the issue of guarding against bias from confounders and how to use the theory of the design of experiment and randomization to remove bias depending on the constraints in the design. Starting with A/B experiments, largely used by major Tech Companies in online marketing, the theory of circuits is introduced and an algebraic methods which gives a wide choice of randomization schemes is presented. Furthermore, a robust exchange algorithm to deal with the problem of outliers in a Big Dataset is proposed. The second part is based on a marine insurance use case sponsored by Swiss Re Corporate Solutions, commercial insurance division of the Swiss Re Group. Several temporal disaggregation methods for dealing with time series collected at different time frequencies are reviewed and applied to real data in order to obtain a curated dataset for predicting future losses.
Model-based Design of Experiments for Large Dataset
PESCE, ELENA
2021
Abstract
The first part of this thesis presents the motivation for adapting ideas and methods from the theory of the model-based optimal design of experiments in the context of Big Data while guarding against different sources of bias. In particular, the key focus is on the issue of guarding against bias from confounders and how to use the theory of the design of experiment and randomization to remove bias depending on the constraints in the design. Starting with A/B experiments, largely used by major Tech Companies in online marketing, the theory of circuits is introduced and an algebraic methods which gives a wide choice of randomization schemes is presented. Furthermore, a robust exchange algorithm to deal with the problem of outliers in a Big Dataset is proposed. The second part is based on a marine insurance use case sponsored by Swiss Re Corporate Solutions, commercial insurance division of the Swiss Re Group. Several temporal disaggregation methods for dealing with time series collected at different time frequencies are reviewed and applied to real data in order to obtain a curated dataset for predicting future losses.File | Dimensione | Formato | |
---|---|---|---|
phdunige_3777600.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
1.9 MB
Formato
Adobe PDF
|
1.9 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/70078
URN:NBN:IT:UNIGE-70078