This Ph.D. thesis presents several advancements in image-based virtual try-on, fashion image editing, and related tasks within the fashion industry. On the Virtual try-on task, this thesis introduces Dress Code, a new, extensive dataset that surpasses existing datasets in size, image quality, and garment categories. It proposes novel methods based on Generative Adversarial Networks (GAN) and diffusion models, different types of architectures, and techniques to improve the generation results. On the image generation task, the Multimodal Garment Designer architecture is introduced as the first latent diffusion model for human-centric fashion image editing conditioned on multimodal inputs. This architecture shows promise in mimicking designers' creative processes. The thesis extends the previous work, presenting the Ti-MGD model, which adds the ability to condition the generation on the fabric texture. On the consumer-to-shop clothes retrieval task, the research proposes a novel loss function to improve performance. It then inquires about cross-modal retrieval techniques, proposing a CLIP-based method tailored for the fashion industry. On the deepfake detection task, the research identifies common low-level features in diffusion-based deepfakes and proposes a method to disentangle semantic and perceptual information. The research also introduces the COCOFake dataset, a large collection of images generated for deepfake detection studies.
LEVERAGING ARTIFICIAL INTELLIGENCE AND COMPUTER VISION FOR THE FASHION DOMAIN
MORELLI, DAVIDE
2025
Abstract
This Ph.D. thesis presents several advancements in image-based virtual try-on, fashion image editing, and related tasks within the fashion industry. On the Virtual try-on task, this thesis introduces Dress Code, a new, extensive dataset that surpasses existing datasets in size, image quality, and garment categories. It proposes novel methods based on Generative Adversarial Networks (GAN) and diffusion models, different types of architectures, and techniques to improve the generation results. On the image generation task, the Multimodal Garment Designer architecture is introduced as the first latent diffusion model for human-centric fashion image editing conditioned on multimodal inputs. This architecture shows promise in mimicking designers' creative processes. The thesis extends the previous work, presenting the Ti-MGD model, which adds the ability to condition the generation on the fabric texture. On the consumer-to-shop clothes retrieval task, the research proposes a novel loss function to improve performance. It then inquires about cross-modal retrieval techniques, proposing a CLIP-based method tailored for the fashion industry. On the deepfake detection task, the research identifies common low-level features in diffusion-based deepfakes and proposes a method to disentangle semantic and perceptual information. The research also introduces the COCOFake dataset, a large collection of images generated for deepfake detection studies.File | Dimensione | Formato | |
---|---|---|---|
FinalReport_MORELLI.pdf
non disponibili
Dimensione
403.25 kB
Formato
Adobe PDF
|
403.25 kB | Adobe PDF | |
thesis.pdf
accesso aperto
Dimensione
147.7 MB
Formato
Adobe PDF
|
147.7 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/215936
URN:NBN:IT:UNIPI-215936