The digital transformation of society has led public institutions and private organizations to embrace digitization, resulting in an unprecedented production of textual documents. Much of this material remains unstructured and primarily intended for human reading and understanding. This thesis investigates how Large Language Models (LLMs) can act as knowledge engineers, transforming raw text into structured, reusable, and queryable information. The central question guiding this work is how far the understanding capabilities of LLMs can automate the organization and retrieval of knowledge traditionally curated by human experts. The research explores two complementary strategies for structuring textual information, focusing on the construction of Knowledge Graphs (KGs): a schema-first approach, where LLMs infer type systems before populating them with entities, and an extraction-first approach, where entities and relations are identified freely and later organized into a type schema. Beyond structuring, the work also examines LLMs as interfaces for retrieval, evaluating their ability to translate natural language questions into KG queries (SPARQL). Finally, a domain application in healthcare demonstrates how ontology-driven KG modeling and LLM-based event extraction can integrate structured and unstructured electronic health record data. Collectively, the contributions of this thesis highlight both the promise and the limitations of current LLMs as instruments of knowledge organization, underscoring the need for guided, multi-step, and hybrid human-AI workflows to achieve robust semantic structuring at scale.
Knowledge Engineering via Large Language Models
TIDDIA, SANDRO GABRIELE
2026
Abstract
The digital transformation of society has led public institutions and private organizations to embrace digitization, resulting in an unprecedented production of textual documents. Much of this material remains unstructured and primarily intended for human reading and understanding. This thesis investigates how Large Language Models (LLMs) can act as knowledge engineers, transforming raw text into structured, reusable, and queryable information. The central question guiding this work is how far the understanding capabilities of LLMs can automate the organization and retrieval of knowledge traditionally curated by human experts. The research explores two complementary strategies for structuring textual information, focusing on the construction of Knowledge Graphs (KGs): a schema-first approach, where LLMs infer type systems before populating them with entities, and an extraction-first approach, where entities and relations are identified freely and later organized into a type schema. Beyond structuring, the work also examines LLMs as interfaces for retrieval, evaluating their ability to translate natural language questions into KG queries (SPARQL). Finally, a domain application in healthcare demonstrates how ontology-driven KG modeling and LLM-based event extraction can integrate structured and unstructured electronic health record data. Collectively, the contributions of this thesis highlight both the promise and the limitations of current LLMs as instruments of knowledge organization, underscoring the need for guided, multi-step, and hybrid human-AI workflows to achieve robust semantic structuring at scale.| File | Dimensione | Formato | |
|---|---|---|---|
|
Knowledge Engineering via Large Language Models.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
1.73 MB
Formato
Adobe PDF
|
1.73 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/362920
URN:NBN:IT:UNICA-362920