Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI), which investigates the interaction between computers and human languages. Despite the tremendous progress that we have witnessed in recent years, largely driven by increasingly sophisticated Deep Learning techniques, NLP systems are still a long way from truly understanding what they process. Within NLP, Natural Language Understanding (NLU) is the area that seeks to enable machine comprehension of human language. One of the key roles of NLU is transforming unstructured text into explicit semantic knowledge, with applications beyond NLP. Nonetheless, modern NLP systems face several challenges that prevent us from achieving true NLU across languages, domains, and applications. These challenges range from the performance disparities between high-resource and low-resource languages, to the increasing model complexity that require specialized expertise, and the heterogeneous mixture of approaches that limits the interaction between different semantic abstractions. In this thesis, we aim to contribute to the field of NLU by addressing each of these challenges. First, we propose novel efficient systems to tackle NLU tasks across multiple languages to mitigate the gap in multilingual performance. Second, we introduce a unified framework to orchestrate different NLU systems while maximizing inference speed and usability, and promoting the integration of semantic knowledge across different domains and applications. Third, we take the first steps towards moving from a unified framework to a unified model for Semantic Knowledge Extraction with an efficient architecture that is capable of handling multiple semantic tasks simultaneously on an academic budget, while also setting a new state of the art. Finally, we also address the multilingual gap from a resource perspective by introducing a novel large-scale multilingual dataset for semantic knowledge extraction. Through these contributions, this thesis makes significant progress towards overcoming the aforementioned challenges, not only from the perspective of benchmark results but also -- and perhaps more importantly -- in terms of enhancing usability, inference speed, and language coverage, which are the keys to enable applications in more domains. We hope our work will foster large-scale research and innovation in NLP, paving the way for the integration of semantics into real-world settings with more accessible, comprehensive, and robust NLU systems.

Enhancing semantic understanding across multiple dimensions: towards a unified framework for semantic knowledge extraction

ORLANDO, RICCARDO
2025

Abstract

Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI), which investigates the interaction between computers and human languages. Despite the tremendous progress that we have witnessed in recent years, largely driven by increasingly sophisticated Deep Learning techniques, NLP systems are still a long way from truly understanding what they process. Within NLP, Natural Language Understanding (NLU) is the area that seeks to enable machine comprehension of human language. One of the key roles of NLU is transforming unstructured text into explicit semantic knowledge, with applications beyond NLP. Nonetheless, modern NLP systems face several challenges that prevent us from achieving true NLU across languages, domains, and applications. These challenges range from the performance disparities between high-resource and low-resource languages, to the increasing model complexity that require specialized expertise, and the heterogeneous mixture of approaches that limits the interaction between different semantic abstractions. In this thesis, we aim to contribute to the field of NLU by addressing each of these challenges. First, we propose novel efficient systems to tackle NLU tasks across multiple languages to mitigate the gap in multilingual performance. Second, we introduce a unified framework to orchestrate different NLU systems while maximizing inference speed and usability, and promoting the integration of semantic knowledge across different domains and applications. Third, we take the first steps towards moving from a unified framework to a unified model for Semantic Knowledge Extraction with an efficient architecture that is capable of handling multiple semantic tasks simultaneously on an academic budget, while also setting a new state of the art. Finally, we also address the multilingual gap from a resource perspective by introducing a novel large-scale multilingual dataset for semantic knowledge extraction. Through these contributions, this thesis makes significant progress towards overcoming the aforementioned challenges, not only from the perspective of benchmark results but also -- and perhaps more importantly -- in terms of enhancing usability, inference speed, and language coverage, which are the keys to enable applications in more domains. We hope our work will foster large-scale research and innovation in NLP, paving the way for the integration of semantics into real-world settings with more accessible, comprehensive, and robust NLU systems.
21-gen-2025
Inglese
NAVIGLI, Roberto
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Orlando.pdf

accesso aperto

Dimensione 6.78 MB
Formato Adobe PDF
6.78 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/190264
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-190264