Efficient and Reliable Semantic Segmentation across Domains and Modalities

Barbato, Francesco

In this thesis we explore the development and enhancement of deep learning models, particularly focusing on semantic segmentation in autonomous systems or personal devices. The works discussed in this document address three critical challenges of deep learning systems: domain shift, multimodal data fusion, and model reliability. With the term domain shift, we denote a side effect caused by differences between training data (e.g., synthetic datasets) and real-world conditions, which results in degraded model performance in the deployment environments. An interesting solution to the problem can be found in the use of Unsupervised Domain Adaptation (UDA) techniques. The task involves improving model adaptability to new environments without the requirement for labeled data which, especially in semantic segmentation, is often expensive to produce and rare to attain. Another possibility is provided by Multimodal data fusion, the idea is to exploit information from sensors that are less affected by domain shift (e.g., thermal cameras instead of color ones). In Multimodal Learning, we combine inputs coming from multiple heterogeneous sensors like RGB cameras, LiDAR, and thermal cameras to achieve an overall higher and more robust performance. To further improve the performance across domains we then investigate techniques that target reliability directly, like pretraining with knowledge distillation and input-level denoising, particularly under noisy conditions or incomplete data. These techniques target model consistency from two complementary points of view, the former focuses on the models' capability to understand information even in hostile environments, thanks to the guidance provided by bigger and more computationally expensive teachers; meanwhile, the latter tries to directly reduce the corruptions of input images before forwarding them to the target network. Finally, to allow the analysis performed throughout the thesis, we introduce multiple large-scale synthetic datasets, which are used by our novel architectures in the various tasks. In general, our approaches demonstrate significant advancements in model performance across multiple scenarios.

Efficient and Reliable Semantic Segmentation across Domains and Modalities

BARBATO, FRANCESCO

2025

Abstract

In this thesis we explore the development and enhancement of deep learning models, particularly focusing on semantic segmentation in autonomous systems or personal devices. The works discussed in this document address three critical challenges of deep learning systems: domain shift, multimodal data fusion, and model reliability. With the term domain shift, we denote a side effect caused by differences between training data (e.g., synthetic datasets) and real-world conditions, which results in degraded model performance in the deployment environments. An interesting solution to the problem can be found in the use of Unsupervised Domain Adaptation (UDA) techniques. The task involves improving model adaptability to new environments without the requirement for labeled data which, especially in semantic segmentation, is often expensive to produce and rare to attain. Another possibility is provided by Multimodal data fusion, the idea is to exploit information from sensors that are less affected by domain shift (e.g., thermal cameras instead of color ones). In Multimodal Learning, we combine inputs coming from multiple heterogeneous sensors like RGB cameras, LiDAR, and thermal cameras to achieve an overall higher and more robust performance. To further improve the performance across domains we then investigate techniques that target reliability directly, like pretraining with knowledge distillation and input-level denoising, particularly under noisy conditions or incomplete data. These techniques target model consistency from two complementary points of view, the former focuses on the models' capability to understand information even in hostile environments, thanks to the guidance provided by bigger and more computationally expensive teachers; meanwhile, the latter tries to directly reduce the corruptions of input images before forwarding them to the target network. Finally, to allow the analysis performed throughout the thesis, we introduce multiple large-scale synthetic datasets, which are used by our novel architectures in the various tasks. In general, our approaches demonstrate significant advancements in model performance across multiple scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INGEGNERIA DELL'INFORMAZIONE
			
	Data di pubblicazione
	
				24-mar-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				ZANUTTIGH, PIETRO
			
	Nome Editore
	
				Università degli studi di Padova
			
	Collezione di appartenenza
	
				Università degli Studi di Padova

File in questo prodotto:

File	Dimensione	Formato
tesi_Francesco_Barbato.pdf accesso aperto Dimensione 15.36 MB Formato Adobe PDF Visualizza/Apri	15.36 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/199688

Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-199688