This thesis explores the application of hyperbolic geometry and hyperbolic neural networks across various domains, with a focus on leveraging uncertainty estimation to improve the learning process and performance in complex tasks. We begin with a brief introduction to hyperbolic neural networks, providing the theoretical foundation and key concepts that underpin our subsequent research. This work then spans three main areas: self-supervised representation learning for skeleton-based action recognition, active domain adaptation for semantic segmentation, and multimodal large language models. First, this thesis investigates self-supervised learning in the context of skeleton-based action recognition, where effective representation learning remains challenging due to the hierarchical nature of human motion data. We introduce hyperbolic neural networks to address this challenge through uncertainty-aware learning, developing a novel Hyperbolic Self-Paced learning model (HYSP). This approach leverages the hyperbolic radius as an uncertainty metric to adaptively pace the learning process, scaling the gradient determined by each sample by the norm of the hyperbolic embedding. When evaluated on standard action recognition benchmarks, HYSP demonstrates superior performance while eliminating the need for computationally expensive negative mining procedures. Next, we explore active learning for semantic segmentation under domain shift, where efficient label acquisition is crucial for adapting to new environments while keeping labeling costs down. For this challenge, we develop a hyperbolic approach named HALO (Hyperbolic Active Learning Optimization), which interprets the hyperbolic radius as an indicator of data scarcity. By combining the hyperbolic radius with prediction entropy, we obtain an estimator of epistemic uncertainty, which we use for selective annotation of pixels in the image. HALO achieves state-of-the-art results on domain adaptation benchmarks while requiring only a small fraction of target labels, surpassing even fully supervised domain adaptation methods. Finally, this thesis examines large-scale vision-language modeling, where uncertainty estimation becomes particularly challenging due to the scale and multimodal nature of the data. By developing a novel training strategy for a hyperbolic version of BLIP-2, we demonstrate that hyperbolic learning can be successfully scaled to billion-parameter architectures without compromising stability or performance. Our approach achieves results comparable to its Euclidean counterpart while providing meaningful uncertainty estimates thanks to hyperbolic embeddings, offering a new perspective on uncertainty quantification in large multimodal models. Throughout these studies, we demonstrate that learning in hyperbolic space offers unique advantages in estimating uncertainty and improving model performance and efficiency across diverse machine learning tasks. This work contributes to the broader understanding of hyperbolic neural networks and their potential to advance the field of deep learning.

Learning with uncertainty via Hyperbolic Neural Networks

MANDICA, PAOLO
2025

Abstract

This thesis explores the application of hyperbolic geometry and hyperbolic neural networks across various domains, with a focus on leveraging uncertainty estimation to improve the learning process and performance in complex tasks. We begin with a brief introduction to hyperbolic neural networks, providing the theoretical foundation and key concepts that underpin our subsequent research. This work then spans three main areas: self-supervised representation learning for skeleton-based action recognition, active domain adaptation for semantic segmentation, and multimodal large language models. First, this thesis investigates self-supervised learning in the context of skeleton-based action recognition, where effective representation learning remains challenging due to the hierarchical nature of human motion data. We introduce hyperbolic neural networks to address this challenge through uncertainty-aware learning, developing a novel Hyperbolic Self-Paced learning model (HYSP). This approach leverages the hyperbolic radius as an uncertainty metric to adaptively pace the learning process, scaling the gradient determined by each sample by the norm of the hyperbolic embedding. When evaluated on standard action recognition benchmarks, HYSP demonstrates superior performance while eliminating the need for computationally expensive negative mining procedures. Next, we explore active learning for semantic segmentation under domain shift, where efficient label acquisition is crucial for adapting to new environments while keeping labeling costs down. For this challenge, we develop a hyperbolic approach named HALO (Hyperbolic Active Learning Optimization), which interprets the hyperbolic radius as an indicator of data scarcity. By combining the hyperbolic radius with prediction entropy, we obtain an estimator of epistemic uncertainty, which we use for selective annotation of pixels in the image. HALO achieves state-of-the-art results on domain adaptation benchmarks while requiring only a small fraction of target labels, surpassing even fully supervised domain adaptation methods. Finally, this thesis examines large-scale vision-language modeling, where uncertainty estimation becomes particularly challenging due to the scale and multimodal nature of the data. By developing a novel training strategy for a hyperbolic version of BLIP-2, we demonstrate that hyperbolic learning can be successfully scaled to billion-parameter architectures without compromising stability or performance. Our approach achieves results comparable to its Euclidean counterpart while providing meaningful uncertainty estimates thanks to hyperbolic embeddings, offering a new perspective on uncertainty quantification in large multimodal models. Throughout these studies, we demonstrate that learning in hyperbolic space offers unique advantages in estimating uncertainty and improving model performance and efficiency across diverse machine learning tasks. This work contributes to the broader understanding of hyperbolic neural networks and their potential to advance the field of deep learning.
15-gen-2025
Inglese
GALASSO, FABIO
MANCINI, MAURIZIO
Università degli Studi di Roma "La Sapienza"
122
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Mandica.pdf

accesso aperto

Dimensione 10.02 MB
Formato Adobe PDF
10.02 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/188447
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-188447