A spoken dialogue system (SDS) is an intuitive and efficient application that allows humans to engage natural communication dialogs with machine to generally perform a specific task. From a high-level point of view, a SDS is composed of several modules which allow a user to engage a natural dialog with a machine in order to solve a problem or to retrieve information. The modules are involved in the dialog and work inside the SDS architecture like a pipeline: the output of a module is the input of the next one. Each module of a SDS is very complex and it took years of research and hard work to provide solutions that can be exploited in real applications. During my thesis work, I have addressed the most important problems of the whole SDS Pipeline that are also the first two modules of the SDS pipeline: The Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). The Spoken Language Understanding module takes as input the transcription of a user utterance, based on words, and output its interpretation, based on semantic concepts. The ASR works closely to the SLU module: it takes as input the audio speech signal of a user utterance, based on an audio file and output its translation (words), that is the input of SLU module. In particular, ASR maps an acoustic signal into a sequence of phonemes or words (discrete entities). A typical ASR process performs the decoding of input speech data by means of signal processing techniques and an acoustic model. SLU component recognizes words that were previously included in its grammar. The development of a grammar is a time-consuming and error-prone process, especially for the inflectional or Neo-Latin languages. I developed a method that produces a grammar for different languages, in particular for Romance languages, for which grammar definition is long and hard to manage. This works describes a solution to facilitate the development of speech-enabled applications and introduces a grammar authoring tool. A second problem concerning ASR is the limited availability of a valid speech corpus for designing a speech recognition acoustic model. Nevertheless, obtaining large datasets is generally both time- and resources consuming as it requires a continuous supervision of the entire building process. This works aims at showing the use of an algorithm for building speech corpora in an automatic way. It allows to replace traditional manual transcription of audio-recordings and to automatically obtain a phonetic dictionary. An Italian acoustic and linguistic model were generated as use-case to test the effectiveness of the proposed procedure.

SPOKEN DIALOG SYSTEMS: FROM AUTOMATIC SPEECH RECOGNITION TO SPOKEN LANGUAGE UNDERSTANDING

INTILISANO, ANTONIO ROSARIO
2015

Abstract

A spoken dialogue system (SDS) is an intuitive and efficient application that allows humans to engage natural communication dialogs with machine to generally perform a specific task. From a high-level point of view, a SDS is composed of several modules which allow a user to engage a natural dialog with a machine in order to solve a problem or to retrieve information. The modules are involved in the dialog and work inside the SDS architecture like a pipeline: the output of a module is the input of the next one. Each module of a SDS is very complex and it took years of research and hard work to provide solutions that can be exploited in real applications. During my thesis work, I have addressed the most important problems of the whole SDS Pipeline that are also the first two modules of the SDS pipeline: The Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). The Spoken Language Understanding module takes as input the transcription of a user utterance, based on words, and output its interpretation, based on semantic concepts. The ASR works closely to the SLU module: it takes as input the audio speech signal of a user utterance, based on an audio file and output its translation (words), that is the input of SLU module. In particular, ASR maps an acoustic signal into a sequence of phonemes or words (discrete entities). A typical ASR process performs the decoding of input speech data by means of signal processing techniques and an acoustic model. SLU component recognizes words that were previously included in its grammar. The development of a grammar is a time-consuming and error-prone process, especially for the inflectional or Neo-Latin languages. I developed a method that produces a grammar for different languages, in particular for Romance languages, for which grammar definition is long and hard to manage. This works describes a solution to facilitate the development of speech-enabled applications and introduces a grammar authoring tool. A second problem concerning ASR is the limited availability of a valid speech corpus for designing a speech recognition acoustic model. Nevertheless, obtaining large datasets is generally both time- and resources consuming as it requires a continuous supervision of the entire building process. This works aims at showing the use of an algorithm for building speech corpora in an automatic way. It allows to replace traditional manual transcription of audio-recordings and to automatically obtain a phonetic dictionary. An Italian acoustic and linguistic model were generated as use-case to test the effectiveness of the proposed procedure.
8-dic-2015
Italiano
CATANIA, Vincenzo
CARCHIOLO, Vincenza
Università degli studi di Catania
Catania
File in questo prodotto:
File Dimensione Formato  
Phd Thesis Intilisano pdfA.pdf

accesso aperto

Dimensione 1.83 MB
Formato Adobe PDF
1.83 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/124186
Il codice NBN di questa tesi è URN:NBN:IT:UNICT-124186