In this thesis we explore the development and enhancement of deep learning models, particularly focusing on semantic segmentation in autonomous systems or personal devices. The works discussed in this document address three critical challenges of deep learning systems: domain shift, multimodal data fusion, and model reliability. With the term domain shift, we denote a side effect caused by differences between training data (e.g., synthetic datasets) and real-world conditions, which results in degraded model performance in the deployment environments. An interesting solution to the problem can be found in the use of Unsupervised Domain Adaptation (UDA) techniques. The task involves improving model adaptability to new environments without the requirement for labeled data which, especially in semantic segmentation, is often expensive to produce and rare to attain. Another possibility is provided by Multimodal data fusion, the idea is to exploit information from sensors that are less affected by domain shift (e.g., thermal cameras instead of color ones). In Multimodal Learning, we combine inputs coming from multiple heterogeneous sensors like RGB cameras, LiDAR, and thermal cameras to achieve an overall higher and more robust performance. To further improve the performance across domains we then investigate techniques that target reliability directly, like pretraining with knowledge distillation and input-level denoising, particularly under noisy conditions or incomplete data. These techniques target model consistency from two complementary points of view, the former focuses on the models' capability to understand information even in hostile environments, thanks to the guidance provided by bigger and more computationally expensive teachers; meanwhile, the latter tries to directly reduce the corruptions of input images before forwarding them to the target network. Finally, to allow the analysis performed throughout the thesis, we introduce multiple large-scale synthetic datasets, which are used by our novel architectures in the various tasks. In general, our approaches demonstrate significant advancements in model performance across multiple scenarios.
Efficient and Reliable Semantic Segmentation across Domains and Modalities
BARBATO, FRANCESCO
2025
Abstract
In this thesis we explore the development and enhancement of deep learning models, particularly focusing on semantic segmentation in autonomous systems or personal devices. The works discussed in this document address three critical challenges of deep learning systems: domain shift, multimodal data fusion, and model reliability. With the term domain shift, we denote a side effect caused by differences between training data (e.g., synthetic datasets) and real-world conditions, which results in degraded model performance in the deployment environments. An interesting solution to the problem can be found in the use of Unsupervised Domain Adaptation (UDA) techniques. The task involves improving model adaptability to new environments without the requirement for labeled data which, especially in semantic segmentation, is often expensive to produce and rare to attain. Another possibility is provided by Multimodal data fusion, the idea is to exploit information from sensors that are less affected by domain shift (e.g., thermal cameras instead of color ones). In Multimodal Learning, we combine inputs coming from multiple heterogeneous sensors like RGB cameras, LiDAR, and thermal cameras to achieve an overall higher and more robust performance. To further improve the performance across domains we then investigate techniques that target reliability directly, like pretraining with knowledge distillation and input-level denoising, particularly under noisy conditions or incomplete data. These techniques target model consistency from two complementary points of view, the former focuses on the models' capability to understand information even in hostile environments, thanks to the guidance provided by bigger and more computationally expensive teachers; meanwhile, the latter tries to directly reduce the corruptions of input images before forwarding them to the target network. Finally, to allow the analysis performed throughout the thesis, we introduce multiple large-scale synthetic datasets, which are used by our novel architectures in the various tasks. In general, our approaches demonstrate significant advancements in model performance across multiple scenarios.File | Dimensione | Formato | |
---|---|---|---|
tesi_Francesco_Barbato.pdf
accesso aperto
Dimensione
15.36 MB
Formato
Adobe PDF
|
15.36 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/199688
URN:NBN:IT:UNIPD-199688