SPOKEN DIALOG SYSTEMS: FROM AUTOMATIC SPEECH RECOGNITION TO SPOKEN LANGUAGE UNDERSTANDING

Intilisano, ANTONIO ROSARIO

A spoken dialogue system (SDS) is an intuitive and efficient application that allows humans to engage natural communication dialogs with machine to generally perform a specific task. From a high-level point of view, a SDS is composed of several modules which allow a user to engage a natural dialog with a machine in order to solve a problem or to retrieve information. The modules are involved in the dialog and work inside the SDS architecture like a pipeline: the output of a module is the input of the next one. Each module of a SDS is very complex and it took years of research and hard work to provide solutions that can be exploited in real applications. During my thesis work, I have addressed the most important problems of the whole SDS Pipeline that are also the first two modules of the SDS pipeline: The Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). The Spoken Language Understanding module takes as input the transcription of a user utterance, based on words, and output its interpretation, based on semantic concepts. The ASR works closely to the SLU module: it takes as input the audio speech signal of a user utterance, based on an audio file and output its translation (words), that is the input of SLU module. In particular, ASR maps an acoustic signal into a sequence of phonemes or words (discrete entities). A typical ASR process performs the decoding of input speech data by means of signal processing techniques and an acoustic model. SLU component recognizes words that were previously included in its grammar. The development of a grammar is a time-consuming and error-prone process, especially for the inflectional or Neo-Latin languages. I developed a method that produces a grammar for different languages, in particular for Romance languages, for which grammar definition is long and hard to manage. This works describes a solution to facilitate the development of speech-enabled applications and introduces a grammar authoring tool. A second problem concerning ASR is the limited availability of a valid speech corpus for designing a speech recognition acoustic model. Nevertheless, obtaining large datasets is generally both time- and resources consuming as it requires a continuous supervision of the entire building process. This works aims at showing the use of an algorithm for building speech corpora in an automatic way. It allows to replace traditional manual transcription of audio-recordings and to automatically obtain a phonetic dictionary. An Italian acoustic and linguistic model were generated as use-case to test the effectiveness of the proposed procedure.

SPOKEN DIALOG SYSTEMS: FROM AUTOMATIC SPEECH RECOGNITION TO SPOKEN LANGUAGE UNDERSTANDING

INTILISANO, ANTONIO ROSARIO

2015

Abstract

A spoken dialogue system (SDS) is an intuitive and efficient application that allows humans to engage natural communication dialogs with machine to generally perform a specific task. From a high-level point of view, a SDS is composed of several modules which allow a user to engage a natural dialog with a machine in order to solve a problem or to retrieve information. The modules are involved in the dialog and work inside the SDS architecture like a pipeline: the output of a module is the input of the next one. Each module of a SDS is very complex and it took years of research and hard work to provide solutions that can be exploited in real applications. During my thesis work, I have addressed the most important problems of the whole SDS Pipeline that are also the first two modules of the SDS pipeline: The Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). The Spoken Language Understanding module takes as input the transcription of a user utterance, based on words, and output its interpretation, based on semantic concepts. The ASR works closely to the SLU module: it takes as input the audio speech signal of a user utterance, based on an audio file and output its translation (words), that is the input of SLU module. In particular, ASR maps an acoustic signal into a sequence of phonemes or words (discrete entities). A typical ASR process performs the decoding of input speech data by means of signal processing techniques and an acoustic model. SLU component recognizes words that were previously included in its grammar. The development of a grammar is a time-consuming and error-prone process, especially for the inflectional or Neo-Latin languages. I developed a method that produces a grammar for different languages, in particular for Romance languages, for which grammar definition is long and hard to manage. This works describes a solution to facilitate the development of speech-enabled applications and introduces a grammar authoring tool. A second problem concerning ASR is the limited availability of a valid speech corpus for designing a speech recognition acoustic model. Nevertheless, obtaining large datasets is generally both time- and resources consuming as it requires a continuous supervision of the entire building process. This works aims at showing the use of an algorithm for building speech corpora in an automatic way. It allows to replace traditional manual transcription of audio-recordings and to automatically obtain a phonetic dictionary. An Italian acoustic and linguistic model were generated as use-case to test the effectiveness of the proposed procedure.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INGEGNERIA INFORMATICA E DELLE TELECOMUNICAZIONI
			
	Data di pubblicazione
	
				8-dic-2015
			
	Lingua
	
				Italiano
			
	Relatore, Supervisor, Advisor o Tutor
	
				CATANIA, Vincenzo
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				CARCHIOLO, Vincenza
			
	Nome Editore
	
				Università degli studi di Catania
			
	Città Editore
	
				Catania
			
	Collezione di appartenenza
	
				Università degli Studi di Catania

File in questo prodotto:

File	Dimensione	Formato
Phd Thesis Intilisano pdfA.pdf accesso aperto Dimensione 1.83 MB Formato Adobe PDF Visualizza/Apri	1.83 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/75722

Il codice NBN di questa tesi è URN:NBN:IT:UNICT-75722