The adoption of machine learning in biomedical and clinical domains is frequently constrained by structural challenges: data scarcity, large dimensionality, and a misalignment between statistical optimization and established medical practice. While data-driven approaches can learn complex patterns, they may lack the transparency and reasoning capabilities required for high-stakes healthcare environments. This thesis proposes a knowledge grounded framework that bridges this gap by integrating structured domain knowledge into the machine learning (ML) pipeline through three aspects: extraction, injection, and evaluation. First, to address knowledge injection in sequential data, this work introduces a novel Structural Positional Encoding (SPE) method for Transformer-based process monitoring. Applied to stroke management guidelines, this approach embeds clinical ontologies directly into the model architecture, yielding better performance and adherence to guidelines compared to standard baselines. Second, addressing injection in high-dimensional data, the thesis presents a Graph Representation Learning (GRL) framework for multi-omics microbiome data. By encoding taxonomic knowledge into a graph structure, this method effectively handles feature sparsity and creates an encoder for microbiome data which can then be optimized for downstream tasks. Finally, addressing the extraction and evaluation gap, the thesis presents a framework that leverages Large Language Models (LLMs) to extract structured clinical reasoning trees from medical literature. This creates a benchmark for assessing whether LLMs reasoning follows clinical knowledge, moving evaluation beyond simple accuracy metrics. Collectively, these contributions demonstrate that explicit knowledge integration acts as a powerful inductive bias, creating systems that are not only statistically robust but also aligned with the logical structures of medical reasoning.

Knowledge-Grounded Machine Learning for Biomedical Domains: Extraction, Injection and Evaluation

IRWIN, CHRISTOPHER
2026

Abstract

The adoption of machine learning in biomedical and clinical domains is frequently constrained by structural challenges: data scarcity, large dimensionality, and a misalignment between statistical optimization and established medical practice. While data-driven approaches can learn complex patterns, they may lack the transparency and reasoning capabilities required for high-stakes healthcare environments. This thesis proposes a knowledge grounded framework that bridges this gap by integrating structured domain knowledge into the machine learning (ML) pipeline through three aspects: extraction, injection, and evaluation. First, to address knowledge injection in sequential data, this work introduces a novel Structural Positional Encoding (SPE) method for Transformer-based process monitoring. Applied to stroke management guidelines, this approach embeds clinical ontologies directly into the model architecture, yielding better performance and adherence to guidelines compared to standard baselines. Second, addressing injection in high-dimensional data, the thesis presents a Graph Representation Learning (GRL) framework for multi-omics microbiome data. By encoding taxonomic knowledge into a graph structure, this method effectively handles feature sparsity and creates an encoder for microbiome data which can then be optimized for downstream tasks. Finally, addressing the extraction and evaluation gap, the thesis presents a framework that leverages Large Language Models (LLMs) to extract structured clinical reasoning trees from medical literature. This creates a benchmark for assessing whether LLMs reasoning follows clinical knowledge, moving evaluation beyond simple accuracy metrics. Collectively, these contributions demonstrate that explicit knowledge integration acts as a powerful inductive bias, creating systems that are not only statistically robust but also aligned with the logical structures of medical reasoning.
25-mag-2026
Inglese
Graph Neural Networks
Portinale, Luigi; Montani, Stefania
SODA, PAOLO
Università Campus Bio-Medico
Torino
File in questo prodotto:
File Dimensione Formato  
PhD_Irwin_Christopher.pdf

accesso aperto

Licenza: Creative Commons
Dimensione 12.92 MB
Formato Adobe PDF
12.92 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/367573
Il codice NBN di questa tesi è URN:NBN:IT:UNICAMPUS-367573