From structural mitigation to semantic completion: advancing recommender systems under data sparsity

Bufi, Salvatore

In the modern digital ecosystem, Recommender Systems (RSs) play a crucial role by learning user preferences from historical interaction data to guide individuals through vast catalogs of content. Despite their widespread adoption, their effectiveness remains fundamentally constrained by the inherent problem of data sparsity. While Knowledge Graphs (KGs) have emerged as a powerful source for enriching sparse collaborative signals, existing approaches often process the entire KG uniformly, learning generic user intents from the whole set of available semantic features. Furthermore, the effectiveness of these models relies on the completeness of the underlying KG, yet real-world KGs are frequently incomplete, creating an upstream challenge of semantic completion. Compounding these issues are emerging challenges where data is intentionally limited, driven by privacy regulations and the push for "Green AI", and a persistent lack of methodological standardization in the research community. This dissertation identifies and addresses four research gaps: 1) the tendency of existing Knowledge-aware RSs to process the entire KG rather than distilling personalized semantic signals; 2) the fundamental reliance of these models on the completeness of the underlying Knowledge Graph, which is often an unrealistic assumption; 3) the unexamined consequences of "imposed data sparsity"; and 4) the lack of standardized data management. To address these gaps, this dissertation presents the following contributions: First, we address structural mitigation by learning user-aligned knowledge. We introduce a graph-based recommender that infers personalized knowledge signals by modeling individual user decision processes, yielding finer personalization and feature-level interpretability. This is complemented by a lightweight simplification that removes explicit intent modules while retaining competitive accuracy. Second, we focus on the semantic completion of knowledge graphs. We tackle inductive link prediction on KGs when explicit type annotations are coarse or missing by eliciting implicit type cues from Pretrained Language Models (PLMs), improving robustness to type and structural sparsity, and strengthening downstream knowledge-aware recommenders. Third, we investigate scenarios where data availability is intentionally limited. We systematically audit how various data minimization techniques affect the performance of different model families, evaluating their robustness in these constrained environments. Finally, to close the reproducibility gap, this work introduces a library for standardized and reproducible data management, designed to unify data handling practices across the research community.

From structural mitigation to semantic completion: advancing recommender systems under data sparsity

BUFI, SALVATORE

2026

Abstract

In the modern digital ecosystem, Recommender Systems (RSs) play a crucial role by learning user preferences from historical interaction data to guide individuals through vast catalogs of content. Despite their widespread adoption, their effectiveness remains fundamentally constrained by the inherent problem of data sparsity. While Knowledge Graphs (KGs) have emerged as a powerful source for enriching sparse collaborative signals, existing approaches often process the entire KG uniformly, learning generic user intents from the whole set of available semantic features. Furthermore, the effectiveness of these models relies on the completeness of the underlying KG, yet real-world KGs are frequently incomplete, creating an upstream challenge of semantic completion. Compounding these issues are emerging challenges where data is intentionally limited, driven by privacy regulations and the push for "Green AI", and a persistent lack of methodological standardization in the research community. This dissertation identifies and addresses four research gaps: 1) the tendency of existing Knowledge-aware RSs to process the entire KG rather than distilling personalized semantic signals; 2) the fundamental reliance of these models on the completeness of the underlying Knowledge Graph, which is often an unrealistic assumption; 3) the unexamined consequences of "imposed data sparsity"; and 4) the lack of standardized data management. To address these gaps, this dissertation presents the following contributions: First, we address structural mitigation by learning user-aligned knowledge. We introduce a graph-based recommender that infers personalized knowledge signals by modeling individual user decision processes, yielding finer personalization and feature-level interpretability. This is complemented by a lightweight simplification that removes explicit intent modules while retaining competitive accuracy. Second, we focus on the semantic completion of knowledge graphs. We tackle inductive link prediction on KGs when explicit type annotations are coarse or missing by eliciting implicit type cues from Pretrained Language Models (PLMs), improving robustness to type and structural sparsity, and strengthening downstream knowledge-aware recommenders. Third, we investigate scenarios where data availability is intentionally limited. We systematically audit how various data minimization techniques affect the performance of different model families, evaluating their robustness in these constrained environments. Finally, to close the reproducibility gap, this work introduces a library for standardized and reproducible data management, designed to unify data handling practices across the research community.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria Elettrica e dell'Informazione
			
	Corso di studio
	
				Ingegneria Elettrica e dell’Informazione
			
	Data di pubblicazione
	
				2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Di Noia, Tommaso
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				Carpentieri, Mario
			
	Nome Editore
	
				Politecnico di Bari
			
	Collezione di appartenenza
	
				Politecnico di Bari

File in questo prodotto:

File	Dimensione	Formato
38 ciclo-BUFI Salvatore.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 3.74 MB Formato Adobe PDF Visualizza/Apri	3.74 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/354946

Il codice NBN di questa tesi è URN:NBN:IT:POLIBA-354946