Nowadays, conversational agents are inspiring the academic and non-academic world thanks to the engaging interaction they establish with the user. However, finding valuable data to train a system able to converse as human-like as possible is not a trivial task. This is even more challenging for the Italian language, where only a few dialogic datasets are available. This thesis expressly addresses this challenge, proposing JILDA (Job Interview Labelled Dialogues Assembly), a new Italian dialogue dataset for the job-offer domain, and demonstrating its practical application for the training of a conversational agent able to understand syntactically and semantically complex data. JILDA dialogues, after being annotated via MATILDA, a new annotation tool developed in collaboration with Wluper, are used to train the Natural Language Understanding module of a conversational agent, as this is an essential component of any dialogue system. Three of the most recent pretrained LMs are benchmarked: Italian BERT, Multilingual BERT, and AlBERTo. Analysing the performance obtained, it was developed JILDA 2.0, an updated version of the resource useful to realise a first step in improving NLU for Italian dialogues. Finally, this thesis frames the research topic within a global ethical framework, considering the ethical issues which emerge in human-machine interaction, the gender biases embedded in the Embodied Conversational Agents (ECAs) and their impacts on modern society.
Training conversational agents to understand complex dialogues
SUCAMELI, IRENE
2022
Abstract
Nowadays, conversational agents are inspiring the academic and non-academic world thanks to the engaging interaction they establish with the user. However, finding valuable data to train a system able to converse as human-like as possible is not a trivial task. This is even more challenging for the Italian language, where only a few dialogic datasets are available. This thesis expressly addresses this challenge, proposing JILDA (Job Interview Labelled Dialogues Assembly), a new Italian dialogue dataset for the job-offer domain, and demonstrating its practical application for the training of a conversational agent able to understand syntactically and semantically complex data. JILDA dialogues, after being annotated via MATILDA, a new annotation tool developed in collaboration with Wluper, are used to train the Natural Language Understanding module of a conversational agent, as this is an essential component of any dialogue system. Three of the most recent pretrained LMs are benchmarked: Italian BERT, Multilingual BERT, and AlBERTo. Analysing the performance obtained, it was developed JILDA 2.0, an updated version of the resource useful to realise a first step in improving NLU for Italian dialogues. Finally, this thesis frames the research topic within a global ethical framework, considering the ethical issues which emerge in human-machine interaction, the gender biases embedded in the Embodied Conversational Agents (ECAs) and their impacts on modern society.File | Dimensione | Formato | |
---|---|---|---|
Report_Sucameli.pdf
accesso aperto
Dimensione
253.95 kB
Formato
Adobe PDF
|
253.95 kB | Adobe PDF | Visualizza/Apri |
Training_conversation_agents_Sucameli.pdf
accesso aperto
Dimensione
3.3 MB
Formato
Adobe PDF
|
3.3 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/216291
URN:NBN:IT:UNIPI-216291