This Ph.D. thesis presents several advancements in image-based virtual try-on, fashion image editing, and related tasks within the fashion industry. On the Virtual try-on task, this thesis introduces Dress Code, a new, extensive dataset that surpasses existing datasets in size, image quality, and garment categories. It proposes novel methods based on Generative Adversarial Networks (GAN) and diffusion models, different types of architectures, and techniques to improve the generation results. On the image generation task, the Multimodal Garment Designer architecture is introduced as the first latent diffusion model for human-centric fashion image editing conditioned on multimodal inputs. This architecture shows promise in mimicking designers' creative processes. The thesis extends the previous work, presenting the Ti-MGD model, which adds the ability to condition the generation on the fabric texture. On the consumer-to-shop clothes retrieval task, the research proposes a novel loss function to improve performance. It then inquires about cross-modal retrieval techniques, proposing a CLIP-based method tailored for the fashion industry. On the deepfake detection task, the research identifies common low-level features in diffusion-based deepfakes and proposes a method to disentangle semantic and perceptual information. The research also introduces the COCOFake dataset, a large collection of images generated for deepfake detection studies.

LEVERAGING ARTIFICIAL INTELLIGENCE AND COMPUTER VISION FOR THE FASHION DOMAIN

MORELLI, DAVIDE
2025

Abstract

This Ph.D. thesis presents several advancements in image-based virtual try-on, fashion image editing, and related tasks within the fashion industry. On the Virtual try-on task, this thesis introduces Dress Code, a new, extensive dataset that surpasses existing datasets in size, image quality, and garment categories. It proposes novel methods based on Generative Adversarial Networks (GAN) and diffusion models, different types of architectures, and techniques to improve the generation results. On the image generation task, the Multimodal Garment Designer architecture is introduced as the first latent diffusion model for human-centric fashion image editing conditioned on multimodal inputs. This architecture shows promise in mimicking designers' creative processes. The thesis extends the previous work, presenting the Ti-MGD model, which adds the ability to condition the generation on the fabric texture. On the consumer-to-shop clothes retrieval task, the research proposes a novel loss function to improve performance. It then inquires about cross-modal retrieval techniques, proposing a CLIP-based method tailored for the fashion industry. On the deepfake detection task, the research identifies common low-level features in diffusion-based deepfakes and proposes a method to disentangle semantic and perceptual information. The research also introduces the COCOFake dataset, a large collection of images generated for deepfake detection studies.
17-feb-2025
Italiano
deepfake detection
image generation
virtual try-on
Cucchiara, Rita
Cornia, Marcella
File in questo prodotto:
File Dimensione Formato  
FinalReport_MORELLI.pdf

non disponibili

Dimensione 403.25 kB
Formato Adobe PDF
403.25 kB Adobe PDF
thesis.pdf

accesso aperto

Dimensione 147.7 MB
Formato Adobe PDF
147.7 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/215936
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-215936