Unified 3D Understanding: 3D Generation And Registration

Wang, Weijie

In modern society, advancements in 3D technologies have enabled the creation of more immersive and realistic virtual experiences while significantly enhancing real-world applications. The widespread adoption of 3D modeling, computer graphics, and virtual reality across industries—from entertainment to engineering—has driven a growing demand for efficient solutions to bridge the gap between virtual and physical spaces. In recent years, deep learning has revolutionized 3D object generation, UV map generation, and point cloud registration by dramatically improving accuracy, efficiency, and automation. Diffusion models, for instance, have significantly advanced 3D object generation, enabling the synthesis of complex geometries and realistic textures. Similarly, point cloud representation learning and differentiable optimization have driven recent breakthroughs in 3D point cloud registration. This thesis explores 3D object generation, UV map generation, and point cloud registration, analyzing their core principles, key applications, and overall value while introducing novel methods to improve existing baselines. It first presents a controllable and personalized UV map generative model that fine-tunes a pre-trained text-to-image diffusion model for identity-driven texture generation. Unlike traditional large-scale training approaches, this method integrates a face fusion module and leverages a small-scale, attribute-balanced dataset with labeled text and Face ID. To address the scarcity of 3D model data, a text-to-3D generation model is proposed, which generates high-quality 3D objects from textual descriptions while incorporating prior knowledge from related shapes and textual information to enhance generation. Further, a novel cross-attention mechanism for Transformer-based architectures is introduced to tackle noisy point correspondences in point cloud registration. This mechanism fuses coordinate and feature information at the super-point level while ensuring rotation and translation invariance—a key challenge in independent reference frames. Finally, a zero-shot point cloud registration approach is proposed that leverages 2D foundation models to predict 3D correspondences, thereby overcoming the limitations of labeled 3D datasets in traditional methods. All proposed methods are rigorously evaluated on multiple benchmarks, demonstrating their potential to advance 3D research and applications.

Unified 3D Understanding: 3D Generation And Registration

Wang, Weijie

2025

Abstract

In modern society, advancements in 3D technologies have enabled the creation of more immersive and realistic virtual experiences while significantly enhancing real-world applications. The widespread adoption of 3D modeling, computer graphics, and virtual reality across industries—from entertainment to engineering—has driven a growing demand for efficient solutions to bridge the gap between virtual and physical spaces. In recent years, deep learning has revolutionized 3D object generation, UV map generation, and point cloud registration by dramatically improving accuracy, efficiency, and automation. Diffusion models, for instance, have significantly advanced 3D object generation, enabling the synthesis of complex geometries and realistic textures. Similarly, point cloud representation learning and differentiable optimization have driven recent breakthroughs in 3D point cloud registration. This thesis explores 3D object generation, UV map generation, and point cloud registration, analyzing their core principles, key applications, and overall value while introducing novel methods to improve existing baselines. It first presents a controllable and personalized UV map generative model that fine-tunes a pre-trained text-to-image diffusion model for identity-driven texture generation. Unlike traditional large-scale training approaches, this method integrates a face fusion module and leverages a small-scale, attribute-balanced dataset with labeled text and Face ID. To address the scarcity of 3D model data, a text-to-3D generation model is proposed, which generates high-quality 3D objects from textual descriptions while incorporating prior knowledge from related shapes and textual information to enhance generation. Further, a novel cross-attention mechanism for Transformer-based architectures is introduced to tackle noisy point correspondences in point cloud registration. This mechanism fuses coordinate and feature information at the super-point level while ensuring rotation and translation invariance—a key challenge in independent reference frames. Finally, a zero-shot point cloud registration approach is proposed that leverages 2D foundation models to predict 3D correspondences, thereby overcoming the limitations of labeled 3D datasets in traditional methods. All proposed methods are rigorously evaluated on multiple benchmarks, demonstrating their potential to advance 3D research and applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Università degli Studi di Trento
			
	Corso di studio
	
				Information and Communication Technology
			
	Data di pubblicazione
	
				29-mag-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Lepri, Bruno
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				Sebe, Niculae
			
	Nome Editore
	
				Università degli studi di Trento
			
	Città Editore
	
				TRENTO
			
	Numero di pagine
	
				189
			
	Collezione di appartenenza
	
				Università degli Studi di Trento

File in questo prodotto:

File	Dimensione	Formato
weijie_Phd_thesis (9).pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 36.8 MB Formato Adobe PDF Visualizza/Apri	36.8 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/212250

Il codice NBN di questa tesi è URN:NBN:IT:UNITN-212250