Accurate and efficient 3D scene representation is pivotal in spatial perception, from state estimation (Simultaneous Localization and Mapping and Structure from Motion) to higher level scene understanding, such as panoptic segmentation. Recent advances in Radiance Fields have shown promising photorealistic and efficient appearance modeling results, yet these methods often struggle in texture-less, dynamic, or large-scale environments. While cameras are the de facto standard for appearance reconstruction, LiDARs could significantly mitigate these weaknesses. This thesis focuses on the strong linkage between the two sensors, highlighting parallels and fusion modalities. It first introduces a simple yet effective LiDAR-camera extrinsic calibration method using a planar target, typically used for vision systems. A direct formulation for Bundle Adjustment is shown, seamlessly integrating with LiDAR and camera pipelines, and this work presents solid claims highlighting the advantages of cross-sensor fusion for pose estimation. Sensor motion distortion, prevalent in LiDARs and rolling shutter (RS) cameras, is addressed through a novel de-skewing method that recovers intra-scan motion using spatiotemporal associations to compensate for distortion. The thesis also introduces a comprehensive multi-sensor benchmark dataset collected in Rome, which provides diverse, high-quality ground truth data for rigorous evaluation of multi-modal 3D reconstruction and localization methods. The central contribution lies in lifting Gaussian Splatting to a cross-modal context. Leveraging geometric consistency from LiDAR, the first LiDAR Odometry and Mapping pipeline is designed to rely on Gaussians as the sole scene representation, achieving state-of-the-art 3D reconstruction with a very low memory profile. Ultimately, the research journey expressed in this thesis highlights the complementarity of the two sensors and the strengths of fusing them for appearance and surface 3D reconstruction, while providing tools and datasets to advance robust multi-modal spatial perception.

Towards multi-modal 3D reconstruction: LiDAR-camera fusion for surface and radiance field modeling

GIACOMINI, EMANUELE
2025

Abstract

Accurate and efficient 3D scene representation is pivotal in spatial perception, from state estimation (Simultaneous Localization and Mapping and Structure from Motion) to higher level scene understanding, such as panoptic segmentation. Recent advances in Radiance Fields have shown promising photorealistic and efficient appearance modeling results, yet these methods often struggle in texture-less, dynamic, or large-scale environments. While cameras are the de facto standard for appearance reconstruction, LiDARs could significantly mitigate these weaknesses. This thesis focuses on the strong linkage between the two sensors, highlighting parallels and fusion modalities. It first introduces a simple yet effective LiDAR-camera extrinsic calibration method using a planar target, typically used for vision systems. A direct formulation for Bundle Adjustment is shown, seamlessly integrating with LiDAR and camera pipelines, and this work presents solid claims highlighting the advantages of cross-sensor fusion for pose estimation. Sensor motion distortion, prevalent in LiDARs and rolling shutter (RS) cameras, is addressed through a novel de-skewing method that recovers intra-scan motion using spatiotemporal associations to compensate for distortion. The thesis also introduces a comprehensive multi-sensor benchmark dataset collected in Rome, which provides diverse, high-quality ground truth data for rigorous evaluation of multi-modal 3D reconstruction and localization methods. The central contribution lies in lifting Gaussian Splatting to a cross-modal context. Leveraging geometric consistency from LiDAR, the first LiDAR Odometry and Mapping pipeline is designed to rely on Gaussians as the sole scene representation, achieving state-of-the-art 3D reconstruction with a very low memory profile. Ultimately, the research journey expressed in this thesis highlights the complementarity of the two sensors and the strengths of fusing them for appearance and surface 3D reconstruction, while providing tools and datasets to advance robust multi-modal spatial perception.
18-set-2025
Inglese
R. Oswald, Martin
GRISETTI, GIORGIO
NAVIGLI, Roberto
Università degli Studi di Roma "La Sapienza"
125
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Giacomini.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 7.83 MB
Formato Adobe PDF
7.83 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/304329
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-304329