The digital transformation of society has led public institutions and private organizations to embrace digitization, resulting in an unprecedented production of textual documents. Much of this material remains unstructured and primarily intended for human reading and understanding. This thesis investigates how Large Language Models (LLMs) can act as knowledge engineers, transforming raw text into structured, reusable, and queryable information. The central question guiding this work is how far the understanding capabilities of LLMs can automate the organization and retrieval of knowledge traditionally curated by human experts. The research explores two complementary strategies for structuring textual information, focusing on the construction of Knowledge Graphs (KGs): a schema-first approach, where LLMs infer type systems before populating them with entities, and an extraction-first approach, where entities and relations are identified freely and later organized into a type schema. Beyond structuring, the work also examines LLMs as interfaces for retrieval, evaluating their ability to translate natural language questions into KG queries (SPARQL). Finally, a domain application in healthcare demonstrates how ontology-driven KG modeling and LLM-based event extraction can integrate structured and unstructured electronic health record data. Collectively, the contributions of this thesis highlight both the promise and the limitations of current LLMs as instruments of knowledge organization, underscoring the need for guided, multi-step, and hybrid human-AI workflows to achieve robust semantic structuring at scale.

Knowledge Engineering via Large Language Models

TIDDIA, SANDRO GABRIELE
2026

Abstract

The digital transformation of society has led public institutions and private organizations to embrace digitization, resulting in an unprecedented production of textual documents. Much of this material remains unstructured and primarily intended for human reading and understanding. This thesis investigates how Large Language Models (LLMs) can act as knowledge engineers, transforming raw text into structured, reusable, and queryable information. The central question guiding this work is how far the understanding capabilities of LLMs can automate the organization and retrieval of knowledge traditionally curated by human experts. The research explores two complementary strategies for structuring textual information, focusing on the construction of Knowledge Graphs (KGs): a schema-first approach, where LLMs infer type systems before populating them with entities, and an extraction-first approach, where entities and relations are identified freely and later organized into a type schema. Beyond structuring, the work also examines LLMs as interfaces for retrieval, evaluating their ability to translate natural language questions into KG queries (SPARQL). Finally, a domain application in healthcare demonstrates how ontology-driven KG modeling and LLM-based event extraction can integrate structured and unstructured electronic health record data. Collectively, the contributions of this thesis highlight both the promise and the limitations of current LLMs as instruments of knowledge organization, underscoring the need for guided, multi-step, and hybrid human-AI workflows to achieve robust semantic structuring at scale.
24-feb-2026
Inglese
CARTA, SALVATORE MARIO
PODDA, ALESSANDRO SEBASTIAN
Università degli Studi di Cagliari
File in questo prodotto:
File Dimensione Formato  
Knowledge Engineering via Large Language Models.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/362920
Il codice NBN di questa tesi è URN:NBN:IT:UNICA-362920