Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as “what the system understands”, to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.

Semantic Language models with deep neural Networks

Bayer, Ali Orkan
2015

Abstract

Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as “what the system understands”, to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.
2015
Inglese
Riccardi, Giuseppe
Università degli studi di Trento
TRENTO
162
File in questo prodotto:
File Dimensione Formato  
bayer_thesis.pdf

accesso aperto

Dimensione 2.31 MB
Formato Adobe PDF
2.31 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/93687
Il codice NBN di questa tesi è URN:NBN:IT:UNITN-93687