Knowledge Engineering via Large Language Models

Tiddia, Sandro Gabriele

The digital transformation of society has led public institutions and private organizations to embrace digitization, resulting in an unprecedented production of textual documents. Much of this material remains unstructured and primarily intended for human reading and understanding. This thesis investigates how Large Language Models (LLMs) can act as knowledge engineers, transforming raw text into structured, reusable, and queryable information. The central question guiding this work is how far the understanding capabilities of LLMs can automate the organization and retrieval of knowledge traditionally curated by human experts. The research explores two complementary strategies for structuring textual information, focusing on the construction of Knowledge Graphs (KGs): a schema-first approach, where LLMs infer type systems before populating them with entities, and an extraction-first approach, where entities and relations are identified freely and later organized into a type schema. Beyond structuring, the work also examines LLMs as interfaces for retrieval, evaluating their ability to translate natural language questions into KG queries (SPARQL). Finally, a domain application in healthcare demonstrates how ontology-driven KG modeling and LLM-based event extraction can integrate structured and unstructured electronic health record data. Collectively, the contributions of this thesis highlight both the promise and the limitations of current LLMs as instruments of knowledge organization, underscoring the need for guided, multi-step, and hybrid human-AI workflows to achieve robust semantic structuring at scale.

Knowledge Engineering via Large Language Models

TIDDIA, SANDRO GABRIELE

2026

Abstract

The digital transformation of society has led public institutions and private organizations to embrace digitization, resulting in an unprecedented production of textual documents. Much of this material remains unstructured and primarily intended for human reading and understanding. This thesis investigates how Large Language Models (LLMs) can act as knowledge engineers, transforming raw text into structured, reusable, and queryable information. The central question guiding this work is how far the understanding capabilities of LLMs can automate the organization and retrieval of knowledge traditionally curated by human experts. The research explores two complementary strategies for structuring textual information, focusing on the construction of Knowledge Graphs (KGs): a schema-first approach, where LLMs infer type systems before populating them with entities, and an extraction-first approach, where entities and relations are identified freely and later organized into a type schema. Beyond structuring, the work also examines LLMs as interfaces for retrieval, evaluating their ability to translate natural language questions into KG queries (SPARQL). Finally, a domain application in healthcare demonstrates how ontology-driven KG modeling and LLM-based event extraction can integrate structured and unstructured electronic health record data. Collectively, the contributions of this thesis highlight both the promise and the limitations of current LLMs as instruments of knowledge organization, underscoring the need for guided, multi-step, and hybrid human-AI workflows to achieve robust semantic structuring at scale.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				MATEMATICA E INFORMATICA
			
	Data di pubblicazione
	
				24-feb-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CARTA, SALVATORE MARIO
PODDA, ALESSANDRO SEBASTIAN
			
	Nome Editore
	
				Università degli Studi di Cagliari
			
	Collezione di appartenenza
	
				Università degli Studi di Cagliari

File in questo prodotto:

File	Dimensione	Formato
Knowledge Engineering via Large Language Models.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 1.73 MB Formato Adobe PDF Visualizza/Apri	1.73 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/362920

Il codice NBN di questa tesi è URN:NBN:IT:UNICA-362920