In the modern digital ecosystem, Recommender Systems (RSs) play a crucial role by learning user preferences from historical interaction data to guide individuals through vast catalogs of content. Despite their widespread adoption, their effectiveness remains fundamentally constrained by the inherent problem of data sparsity. While Knowledge Graphs (KGs) have emerged as a powerful source for enriching sparse collaborative signals, existing approaches often process the entire KG uniformly, learning generic user intents from the whole set of available semantic features. Furthermore, the effectiveness of these models relies on the completeness of the underlying KG, yet real-world KGs are frequently incomplete, creating an upstream challenge of semantic completion. Compounding these issues are emerging challenges where data is intentionally limited, driven by privacy regulations and the push for "Green AI", and a persistent lack of methodological standardization in the research community. This dissertation identifies and addresses four research gaps: 1) the tendency of existing Knowledge-aware RSs to process the entire KG rather than distilling personalized semantic signals; 2) the fundamental reliance of these models on the completeness of the underlying Knowledge Graph, which is often an unrealistic assumption; 3) the unexamined consequences of "imposed data sparsity"; and 4) the lack of standardized data management. To address these gaps, this dissertation presents the following contributions: First, we address structural mitigation by learning user-aligned knowledge. We introduce a graph-based recommender that infers personalized knowledge signals by modeling individual user decision processes, yielding finer personalization and feature-level interpretability. This is complemented by a lightweight simplification that removes explicit intent modules while retaining competitive accuracy. Second, we focus on the semantic completion of knowledge graphs. We tackle inductive link prediction on KGs when explicit type annotations are coarse or missing by eliciting implicit type cues from Pretrained Language Models (PLMs), improving robustness to type and structural sparsity, and strengthening downstream knowledge-aware recommenders. Third, we investigate scenarios where data availability is intentionally limited. We systematically audit how various data minimization techniques affect the performance of different model families, evaluating their robustness in these constrained environments. Finally, to close the reproducibility gap, this work introduces a library for standardized and reproducible data management, designed to unify data handling practices across the research community.

From structural mitigation to semantic completion: advancing recommender systems under data sparsity

BUFI, SALVATORE
2026

Abstract

In the modern digital ecosystem, Recommender Systems (RSs) play a crucial role by learning user preferences from historical interaction data to guide individuals through vast catalogs of content. Despite their widespread adoption, their effectiveness remains fundamentally constrained by the inherent problem of data sparsity. While Knowledge Graphs (KGs) have emerged as a powerful source for enriching sparse collaborative signals, existing approaches often process the entire KG uniformly, learning generic user intents from the whole set of available semantic features. Furthermore, the effectiveness of these models relies on the completeness of the underlying KG, yet real-world KGs are frequently incomplete, creating an upstream challenge of semantic completion. Compounding these issues are emerging challenges where data is intentionally limited, driven by privacy regulations and the push for "Green AI", and a persistent lack of methodological standardization in the research community. This dissertation identifies and addresses four research gaps: 1) the tendency of existing Knowledge-aware RSs to process the entire KG rather than distilling personalized semantic signals; 2) the fundamental reliance of these models on the completeness of the underlying Knowledge Graph, which is often an unrealistic assumption; 3) the unexamined consequences of "imposed data sparsity"; and 4) the lack of standardized data management. To address these gaps, this dissertation presents the following contributions: First, we address structural mitigation by learning user-aligned knowledge. We introduce a graph-based recommender that infers personalized knowledge signals by modeling individual user decision processes, yielding finer personalization and feature-level interpretability. This is complemented by a lightweight simplification that removes explicit intent modules while retaining competitive accuracy. Second, we focus on the semantic completion of knowledge graphs. We tackle inductive link prediction on KGs when explicit type annotations are coarse or missing by eliciting implicit type cues from Pretrained Language Models (PLMs), improving robustness to type and structural sparsity, and strengthening downstream knowledge-aware recommenders. Third, we investigate scenarios where data availability is intentionally limited. We systematically audit how various data minimization techniques affect the performance of different model families, evaluating their robustness in these constrained environments. Finally, to close the reproducibility gap, this work introduces a library for standardized and reproducible data management, designed to unify data handling practices across the research community.
2026
Inglese
Di Noia, Tommaso
Carpentieri, Mario
Politecnico di Bari
File in questo prodotto:
File Dimensione Formato  
38 ciclo-BUFI Salvatore.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 3.74 MB
Formato Adobe PDF
3.74 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/354946
Il codice NBN di questa tesi è URN:NBN:IT:POLIBA-354946