In the era of Large Language Models (LLMs), Information Extraction (IE) may seem like a “Chronicle of a Death Foretold”. Between 2020 and 2023, it ranked among the top three most popular topics at conferences like ACL, yet by 2024, it had dropped to tenth place. The advent of Transformer Language Models (LMs), emerging just before work on this dissertation began, has transformed the field of Natural Language Processing (NLP), enabling unprecedented performance across a broad range of Natural Language Understanding (NLU) tasks. Surprisingly, scaling these models into LLMs has not led to diminishing returns but has instead further expanded their capabilities. However, there remains a need for efficient methods suitable for real-world applications that require low latency or the ability to process large volumes of real-time data—domains where large models are often impractical. Additionally, tasks reliant on LLMs’ parametric memory face limitations due to neural inference, where accuracy and recency of information cannot always be guaranteed. While LLMs show great promise, they increasingly require grounding in external knowledge sources for reliable results. This is where IE becomes indispensable. Rather than being replaced, IE complements and strengthens LLMs, supporting their reasoning with accurate, grounded information. Knowledge Graphs (KGs) serve as structured frameworks that bridge unstructured text and structured knowledge, enabling scalable, interpretable organization of vast amounts of information. Essential for applications like semantic search, recommendation systems, and question-answering, KGs rely heavily on robust IE techniques. In this thesis, we focus on advancing multilingual IE methods to enhance KG construction and address limitations in existing IE systems.

From text to knowledge: multilingual information extraction for knowledge graph construction

HUGUET CABOT, PERE-LLUIS
2025

Abstract

In the era of Large Language Models (LLMs), Information Extraction (IE) may seem like a “Chronicle of a Death Foretold”. Between 2020 and 2023, it ranked among the top three most popular topics at conferences like ACL, yet by 2024, it had dropped to tenth place. The advent of Transformer Language Models (LMs), emerging just before work on this dissertation began, has transformed the field of Natural Language Processing (NLP), enabling unprecedented performance across a broad range of Natural Language Understanding (NLU) tasks. Surprisingly, scaling these models into LLMs has not led to diminishing returns but has instead further expanded their capabilities. However, there remains a need for efficient methods suitable for real-world applications that require low latency or the ability to process large volumes of real-time data—domains where large models are often impractical. Additionally, tasks reliant on LLMs’ parametric memory face limitations due to neural inference, where accuracy and recency of information cannot always be guaranteed. While LLMs show great promise, they increasingly require grounding in external knowledge sources for reliable results. This is where IE becomes indispensable. Rather than being replaced, IE complements and strengthens LLMs, supporting their reasoning with accurate, grounded information. Knowledge Graphs (KGs) serve as structured frameworks that bridge unstructured text and structured knowledge, enabling scalable, interpretable organization of vast amounts of information. Essential for applications like semantic search, recommendation systems, and question-answering, KGs rely heavily on robust IE techniques. In this thesis, we focus on advancing multilingual IE methods to enhance KG construction and address limitations in existing IE systems.
23-gen-2025
Inglese
NAVIGLI, Roberto
LENZERINI, Maurizio
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_HuguetCabot.pdf

accesso aperto

Dimensione 7.8 MB
Formato Adobe PDF
7.8 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/189207
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-189207