Nowadays, conversational agents are inspiring the academic and non-academic world thanks to the engaging interaction they establish with the user. However, finding valuable data to train a system able to converse as human-like as possible is not a trivial task. This is even more challenging for the Italian language, where only a few dialogic datasets are available. This thesis expressly addresses this challenge, proposing JILDA (Job Interview Labelled Dialogues Assembly), a new Italian dialogue dataset for the job-offer domain, and demonstrating its practical application for the training of a conversational agent able to understand syntactically and semantically complex data. JILDA dialogues, after being annotated via MATILDA, a new annotation tool developed in collaboration with Wluper, are used to train the Natural Language Understanding module of a conversational agent, as this is an essential component of any dialogue system. Three of the most recent pretrained LMs are benchmarked: Italian BERT, Multilingual BERT, and AlBERTo. Analysing the performance obtained, it was developed JILDA 2.0, an updated version of the resource useful to realise a first step in improving NLU for Italian dialogues. Finally, this thesis frames the research topic within a global ethical framework, considering the ethical issues which emerge in human-machine interaction, the gender biases embedded in the Embodied Conversational Agents (ECAs) and their impacts on modern society.

Training conversational agents to understand complex dialogues

SUCAMELI, IRENE
2022

Abstract

Nowadays, conversational agents are inspiring the academic and non-academic world thanks to the engaging interaction they establish with the user. However, finding valuable data to train a system able to converse as human-like as possible is not a trivial task. This is even more challenging for the Italian language, where only a few dialogic datasets are available. This thesis expressly addresses this challenge, proposing JILDA (Job Interview Labelled Dialogues Assembly), a new Italian dialogue dataset for the job-offer domain, and demonstrating its practical application for the training of a conversational agent able to understand syntactically and semantically complex data. JILDA dialogues, after being annotated via MATILDA, a new annotation tool developed in collaboration with Wluper, are used to train the Natural Language Understanding module of a conversational agent, as this is an essential component of any dialogue system. Three of the most recent pretrained LMs are benchmarked: Italian BERT, Multilingual BERT, and AlBERTo. Analysing the performance obtained, it was developed JILDA 2.0, an updated version of the resource useful to realise a first step in improving NLU for Italian dialogues. Finally, this thesis frames the research topic within a global ethical framework, considering the ethical issues which emerge in human-machine interaction, the gender biases embedded in the Embodied Conversational Agents (ECAs) and their impacts on modern society.
4-mag-2022
Italiano
annotation tool
conversational agents
ethics and ECAs
Italian dialogue dataset
Simi, Maria
Attardi, Giuseppe
Lenci, Alessandro
File in questo prodotto:
File Dimensione Formato  
Report_Sucameli.pdf

accesso aperto

Dimensione 253.95 kB
Formato Adobe PDF
253.95 kB Adobe PDF Visualizza/Apri
Training_conversation_agents_Sucameli.pdf

accesso aperto

Dimensione 3.3 MB
Formato Adobe PDF
3.3 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216291
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216291