In the era of Large Language Models (LLMs), Information Extraction (IE) may seem like a “Chronicle of a Death Foretold”. Between 2020 and 2023, it ranked among the top three most popular topics at conferences like ACL, yet by 2024, it had dropped to tenth place. The advent of Transformer Language Models (LMs), emerging just before work on this dissertation began, has transformed the field of Natural Language Processing (NLP), enabling unprecedented performance across a broad range of Natural Language Understanding (NLU) tasks. Surprisingly, scaling these models into LLMs has not led to diminishing returns but has instead further expanded their capabilities. However, there remains a need for efficient methods suitable for real-world applications that require low latency or the ability to process large volumes of real-time data—domains where large models are often impractical. Additionally, tasks reliant on LLMs’ parametric memory face limitations due to neural inference, where accuracy and recency of information cannot always be guaranteed. While LLMs show great promise, they increasingly require grounding in external knowledge sources for reliable results. This is where IE becomes indispensable. Rather than being replaced, IE complements and strengthens LLMs, supporting their reasoning with accurate, grounded information. Knowledge Graphs (KGs) serve as structured frameworks that bridge unstructured text and structured knowledge, enabling scalable, interpretable organization of vast amounts of information. Essential for applications like semantic search, recommendation systems, and question-answering, KGs rely heavily on robust IE techniques. In this thesis, we focus on advancing multilingual IE methods to enhance KG construction and address limitations in existing IE systems.
From text to knowledge: multilingual information extraction for knowledge graph construction
HUGUET CABOT, PERE-LLUIS
2025
Abstract
In the era of Large Language Models (LLMs), Information Extraction (IE) may seem like a “Chronicle of a Death Foretold”. Between 2020 and 2023, it ranked among the top three most popular topics at conferences like ACL, yet by 2024, it had dropped to tenth place. The advent of Transformer Language Models (LMs), emerging just before work on this dissertation began, has transformed the field of Natural Language Processing (NLP), enabling unprecedented performance across a broad range of Natural Language Understanding (NLU) tasks. Surprisingly, scaling these models into LLMs has not led to diminishing returns but has instead further expanded their capabilities. However, there remains a need for efficient methods suitable for real-world applications that require low latency or the ability to process large volumes of real-time data—domains where large models are often impractical. Additionally, tasks reliant on LLMs’ parametric memory face limitations due to neural inference, where accuracy and recency of information cannot always be guaranteed. While LLMs show great promise, they increasingly require grounding in external knowledge sources for reliable results. This is where IE becomes indispensable. Rather than being replaced, IE complements and strengthens LLMs, supporting their reasoning with accurate, grounded information. Knowledge Graphs (KGs) serve as structured frameworks that bridge unstructured text and structured knowledge, enabling scalable, interpretable organization of vast amounts of information. Essential for applications like semantic search, recommendation systems, and question-answering, KGs rely heavily on robust IE techniques. In this thesis, we focus on advancing multilingual IE methods to enhance KG construction and address limitations in existing IE systems.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_HuguetCabot.pdf
accesso aperto
Dimensione
7.8 MB
Formato
Adobe PDF
|
7.8 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/189207
URN:NBN:IT:UNIROMA1-189207