This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions. In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.

EFFICIENCY AND REALISM IN STOCHASTIC BANDITS

CELLA, LEONARDO
2021

Abstract

This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions. In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.
15-gen-2021
Inglese
machine learning; multi-armed bandits; stochastic bandits; online learning
CESA BIANCHI, NICOLO' ANTONIO
BOLDI, PAOLO
Università degli Studi di Milano
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R11945.pdf

accesso aperto

Dimensione 3.33 MB
Formato Adobe PDF
3.33 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/72880
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-72880