In modern society, advancements in 3D technologies have enabled the creation of more immersive and realistic virtual experiences while significantly enhancing real-world applications. The widespread adoption of 3D modeling, computer graphics, and virtual reality across industries—from entertainment to engineering—has driven a growing demand for efficient solutions to bridge the gap between virtual and physical spaces. In recent years, deep learning has revolutionized 3D object generation, UV map generation, and point cloud registration by dramatically improving accuracy, efficiency, and automation. Diffusion models, for instance, have significantly advanced 3D object generation, enabling the synthesis of complex geometries and realistic textures. Similarly, point cloud representation learning and differentiable optimization have driven recent breakthroughs in 3D point cloud registration. This thesis explores 3D object generation, UV map generation, and point cloud registration, analyzing their core principles, key applications, and overall value while introducing novel methods to improve existing baselines. It first presents a controllable and personalized UV map generative model that fine-tunes a pre-trained text-to-image diffusion model for identity-driven texture generation. Unlike traditional large-scale training approaches, this method integrates a face fusion module and leverages a small-scale, attribute-balanced dataset with labeled text and Face ID. To address the scarcity of 3D model data, a text-to-3D generation model is proposed, which generates high-quality 3D objects from textual descriptions while incorporating prior knowledge from related shapes and textual information to enhance generation. Further, a novel cross-attention mechanism for Transformer-based architectures is introduced to tackle noisy point correspondences in point cloud registration. This mechanism fuses coordinate and feature information at the super-point level while ensuring rotation and translation invariance—a key challenge in independent reference frames. Finally, a zero-shot point cloud registration approach is proposed that leverages 2D foundation models to predict 3D correspondences, thereby overcoming the limitations of labeled 3D datasets in traditional methods. All proposed methods are rigorously evaluated on multiple benchmarks, demonstrating their potential to advance 3D research and applications.
Unified 3D Understanding: 3D Generation And Registration
Wang, Weijie
2025
Abstract
In modern society, advancements in 3D technologies have enabled the creation of more immersive and realistic virtual experiences while significantly enhancing real-world applications. The widespread adoption of 3D modeling, computer graphics, and virtual reality across industries—from entertainment to engineering—has driven a growing demand for efficient solutions to bridge the gap between virtual and physical spaces. In recent years, deep learning has revolutionized 3D object generation, UV map generation, and point cloud registration by dramatically improving accuracy, efficiency, and automation. Diffusion models, for instance, have significantly advanced 3D object generation, enabling the synthesis of complex geometries and realistic textures. Similarly, point cloud representation learning and differentiable optimization have driven recent breakthroughs in 3D point cloud registration. This thesis explores 3D object generation, UV map generation, and point cloud registration, analyzing their core principles, key applications, and overall value while introducing novel methods to improve existing baselines. It first presents a controllable and personalized UV map generative model that fine-tunes a pre-trained text-to-image diffusion model for identity-driven texture generation. Unlike traditional large-scale training approaches, this method integrates a face fusion module and leverages a small-scale, attribute-balanced dataset with labeled text and Face ID. To address the scarcity of 3D model data, a text-to-3D generation model is proposed, which generates high-quality 3D objects from textual descriptions while incorporating prior knowledge from related shapes and textual information to enhance generation. Further, a novel cross-attention mechanism for Transformer-based architectures is introduced to tackle noisy point correspondences in point cloud registration. This mechanism fuses coordinate and feature information at the super-point level while ensuring rotation and translation invariance—a key challenge in independent reference frames. Finally, a zero-shot point cloud registration approach is proposed that leverages 2D foundation models to predict 3D correspondences, thereby overcoming the limitations of labeled 3D datasets in traditional methods. All proposed methods are rigorously evaluated on multiple benchmarks, demonstrating their potential to advance 3D research and applications.File | Dimensione | Formato | |
---|---|---|---|
weijie_Phd_thesis (9).pdf
accesso aperto
Dimensione
36.8 MB
Formato
Adobe PDF
|
36.8 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/212250
URN:NBN:IT:UNITN-212250