A spoken dialogue system (SDS) is an intuitive and efficient application that allows humans to engage natural communication dialogs with machine to generally perform a specific task. From a high-level point of view, a SDS is composed of several modules which allow a user to engage a natural dialog with a machine in order to solve a problem or to retrieve information. The modules are involved in the dialog and work inside the SDS architecture like a pipeline: the output of a module is the input of the next one. Each module of a SDS is very complex and it took years of research and hard work to provide solutions that can be exploited in real applications. During my thesis work, I have addressed the most important problems of the whole SDS Pipeline that are also the first two modules of the SDS pipeline: The Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). The Spoken Language Understanding module takes as input the transcription of a user utterance, based on words, and output its interpretation, based on semantic concepts. The ASR works closely to the SLU module: it takes as input the audio speech signal of a user utterance, based on an audio file and output its translation (words), that is the input of SLU module. In particular, ASR maps an acoustic signal into a sequence of phonemes or words (discrete entities). A typical ASR process performs the decoding of input speech data by means of signal processing techniques and an acoustic model. SLU component recognizes words that were previously included in its grammar. The development of a grammar is a time-consuming and error-prone process, especially for the inflectional or Neo-Latin languages. I developed a method that produces a grammar for different languages, in particular for Romance languages, for which grammar definition is long and hard to manage. This works describes a solution to facilitate the development of speech-enabled applications and introduces a grammar authoring tool. A second problem concerning ASR is the limited availability of a valid speech corpus for designing a speech recognition acoustic model. Nevertheless, obtaining large datasets is generally both time- and resources consuming as it requires a continuous supervision of the entire building process. This works aims at showing the use of an algorithm for building speech corpora in an automatic way. It allows to replace traditional manual transcription of audio-recordings and to automatically obtain a phonetic dictionary. An Italian acoustic and linguistic model were generated as use-case to test the effectiveness of the proposed procedure.
SPOKEN DIALOG SYSTEMS: FROM AUTOMATIC SPEECH RECOGNITION TO SPOKEN LANGUAGE UNDERSTANDING
INTILISANO, ANTONIO ROSARIO
2015
Abstract
A spoken dialogue system (SDS) is an intuitive and efficient application that allows humans to engage natural communication dialogs with machine to generally perform a specific task. From a high-level point of view, a SDS is composed of several modules which allow a user to engage a natural dialog with a machine in order to solve a problem or to retrieve information. The modules are involved in the dialog and work inside the SDS architecture like a pipeline: the output of a module is the input of the next one. Each module of a SDS is very complex and it took years of research and hard work to provide solutions that can be exploited in real applications. During my thesis work, I have addressed the most important problems of the whole SDS Pipeline that are also the first two modules of the SDS pipeline: The Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). The Spoken Language Understanding module takes as input the transcription of a user utterance, based on words, and output its interpretation, based on semantic concepts. The ASR works closely to the SLU module: it takes as input the audio speech signal of a user utterance, based on an audio file and output its translation (words), that is the input of SLU module. In particular, ASR maps an acoustic signal into a sequence of phonemes or words (discrete entities). A typical ASR process performs the decoding of input speech data by means of signal processing techniques and an acoustic model. SLU component recognizes words that were previously included in its grammar. The development of a grammar is a time-consuming and error-prone process, especially for the inflectional or Neo-Latin languages. I developed a method that produces a grammar for different languages, in particular for Romance languages, for which grammar definition is long and hard to manage. This works describes a solution to facilitate the development of speech-enabled applications and introduces a grammar authoring tool. A second problem concerning ASR is the limited availability of a valid speech corpus for designing a speech recognition acoustic model. Nevertheless, obtaining large datasets is generally both time- and resources consuming as it requires a continuous supervision of the entire building process. This works aims at showing the use of an algorithm for building speech corpora in an automatic way. It allows to replace traditional manual transcription of audio-recordings and to automatically obtain a phonetic dictionary. An Italian acoustic and linguistic model were generated as use-case to test the effectiveness of the proposed procedure.File | Dimensione | Formato | |
---|---|---|---|
Phd Thesis Intilisano pdfA.pdf
accesso aperto
Dimensione
1.83 MB
Formato
Adobe PDF
|
1.83 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/124186
URN:NBN:IT:UNICT-124186