Knowledge-Grounded Machine Learning for Biomedical Domains: Extraction, Injection and Evaluation

Irwin, Christopher

The adoption of machine learning in biomedical and clinical domains is frequently constrained by structural challenges: data scarcity, large dimensionality, and a misalignment between statistical optimization and established medical practice. While data-driven approaches can learn complex patterns, they may lack the transparency and reasoning capabilities required for high-stakes healthcare environments. This thesis proposes a knowledge grounded framework that bridges this gap by integrating structured domain knowledge into the machine learning (ML) pipeline through three aspects: extraction, injection, and evaluation. First, to address knowledge injection in sequential data, this work introduces a novel Structural Positional Encoding (SPE) method for Transformer-based process monitoring. Applied to stroke management guidelines, this approach embeds clinical ontologies directly into the model architecture, yielding better performance and adherence to guidelines compared to standard baselines. Second, addressing injection in high-dimensional data, the thesis presents a Graph Representation Learning (GRL) framework for multi-omics microbiome data. By encoding taxonomic knowledge into a graph structure, this method effectively handles feature sparsity and creates an encoder for microbiome data which can then be optimized for downstream tasks. Finally, addressing the extraction and evaluation gap, the thesis presents a framework that leverages Large Language Models (LLMs) to extract structured clinical reasoning trees from medical literature. This creates a benchmark for assessing whether LLMs reasoning follows clinical knowledge, moving evaluation beyond simple accuracy metrics. Collectively, these contributions demonstrate that explicit knowledge integration acts as a powerful inductive bias, creating systems that are not only statistically robust but also aligned with the logical structures of medical reasoning.

Knowledge-Grounded Machine Learning for Biomedical Domains: Extraction, Injection and Evaluation

IRWIN, CHRISTOPHER

2026

Abstract

The adoption of machine learning in biomedical and clinical domains is frequently constrained by structural challenges: data scarcity, large dimensionality, and a misalignment between statistical optimization and established medical practice. While data-driven approaches can learn complex patterns, they may lack the transparency and reasoning capabilities required for high-stakes healthcare environments. This thesis proposes a knowledge grounded framework that bridges this gap by integrating structured domain knowledge into the machine learning (ML) pipeline through three aspects: extraction, injection, and evaluation. First, to address knowledge injection in sequential data, this work introduces a novel Structural Positional Encoding (SPE) method for Transformer-based process monitoring. Applied to stroke management guidelines, this approach embeds clinical ontologies directly into the model architecture, yielding better performance and adherence to guidelines compared to standard baselines. Second, addressing injection in high-dimensional data, the thesis presents a Graph Representation Learning (GRL) framework for multi-omics microbiome data. By encoding taxonomic knowledge into a graph structure, this method effectively handles feature sparsity and creates an encoder for microbiome data which can then be optimized for downstream tasks. Finally, addressing the extraction and evaluation gap, the thesis presents a framework that leverages Large Language Models (LLMs) to extract structured clinical reasoning trees from medical literature. This creates a benchmark for assessing whether LLMs reasoning follows clinical knowledge, moving evaluation beyond simple accuracy metrics. Collectively, these contributions demonstrate that explicit knowledge integration acts as a powerful inductive bias, creating systems that are not only statistically robust but also aligned with the logical structures of medical reasoning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Dottorato nazionale in intelligenza artificiale
			
	Data di pubblicazione
	
				25-mag-2026
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Graph Neural Networks
			
	Relatore, Supervisor, Advisor o Tutor
	
				Portinale, Luigi; Montani, Stefania
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				SODA, PAOLO
			
	Nome Editore
	
				Università Campus Bio-Medico
			
	Città Editore
	
				Torino
			
	Collezione di appartenenza
	
				Università Campus Bio-medico di Roma

File in questo prodotto:

File	Dimensione	Formato
PhD_Irwin_Christopher.pdf accesso aperto Licenza: Creative Commons Dimensione 12.92 MB Formato Adobe PDF Visualizza/Apri	12.92 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/367573

Il codice NBN di questa tesi è URN:NBN:IT:UNICAMPUS-367573