Advanced Computer Vision for Smart Retail: From Anomaly Detection to Fine-grained Product Classification

Tur, Anil Osman

The rapid evolution of smart retail environments has created a pressing need for advanced computer vision solutions capable of addressing critical operational challenges, particularly in automated anomaly detection, fine-grained product classification, and interactive product localization. Existing systems often struggle with the complexity and variability inherent in retail scenarios, including dynamic customer behaviors, subtle product differences, occlusions, and inconsistent lighting conditions. This thesis addresses these challenges by proposing novel methodologies in three complementary research directions. First, we introduce an unsupervised anomaly detection framework based on diffusion models and compact motion representations. Our diffusion-based approach effectively captures complex spatiotemporal patterns without relying on labeled training data, while our compact motion representations significantly enhance computational efficiency without sacrificing detection accuracy. Extensive experimental evaluations demonstrate that our methods outperform current state-of-the-art techniques on benchmark datasets, highlighting their potential for real-world surveillance applications in retail environments. Second, we propose a robust zero-shot fine-grained product classification pipeline leveraging advanced vision models, specifically CLIP and DINOv2. By utilizing visual embeddings and prototype-based classification strategies, our approach effectively distinguishes visually similar products without requiring explicit training data for each class. To facilitate rigorous evaluation, we introduce the MIMEX dataset, a challenging benchmark tailored specifically to fine-grained retail product classification tasks. Our experiments confirm the superior performance and generalization capabilities of our proposed methods compared to existing approaches. Third, we present a zero-shot referring segmentation framework that enables precise localization and segmentation of products based on natural language descriptions. By combining open-vocabulary object detection with large language model reasoning, our approach addresses the unique challenges of dense product arrangements and fine-grained distinctions in complex retail scenes without requiring domain-specific training. This framework allows intuitive human-AI interaction, enabling customers and store associates to identify and locate products using natural language queries. Overall, this thesis advances the state-of-the-art computer vision for smart retail by introducing efficient unsupervised anomaly detection techniques, effective zero-shot fine-grained classification frameworks, and a novel approach to zero-shot referring segmentation. The proposed methodologies not only address key limitations of existing systems but also provide practical solutions readily applicable to real-world retail settings.

Advanced Computer Vision for Smart Retail: From Anomaly Detection to Fine-grained Product Classification

Tur, Anil Osman

2025

Abstract

The rapid evolution of smart retail environments has created a pressing need for advanced computer vision solutions capable of addressing critical operational challenges, particularly in automated anomaly detection, fine-grained product classification, and interactive product localization. Existing systems often struggle with the complexity and variability inherent in retail scenarios, including dynamic customer behaviors, subtle product differences, occlusions, and inconsistent lighting conditions. This thesis addresses these challenges by proposing novel methodologies in three complementary research directions. First, we introduce an unsupervised anomaly detection framework based on diffusion models and compact motion representations. Our diffusion-based approach effectively captures complex spatiotemporal patterns without relying on labeled training data, while our compact motion representations significantly enhance computational efficiency without sacrificing detection accuracy. Extensive experimental evaluations demonstrate that our methods outperform current state-of-the-art techniques on benchmark datasets, highlighting their potential for real-world surveillance applications in retail environments. Second, we propose a robust zero-shot fine-grained product classification pipeline leveraging advanced vision models, specifically CLIP and DINOv2. By utilizing visual embeddings and prototype-based classification strategies, our approach effectively distinguishes visually similar products without requiring explicit training data for each class. To facilitate rigorous evaluation, we introduce the MIMEX dataset, a challenging benchmark tailored specifically to fine-grained retail product classification tasks. Our experiments confirm the superior performance and generalization capabilities of our proposed methods compared to existing approaches. Third, we present a zero-shot referring segmentation framework that enables precise localization and segmentation of products based on natural language descriptions. By combining open-vocabulary object detection with large language model reasoning, our approach addresses the unique challenges of dense product arrangements and fine-grained distinctions in complex retail scenes without requiring domain-specific training. This framework allows intuitive human-AI interaction, enabling customers and store associates to identify and locate products using natural language queries. Overall, this thesis advances the state-of-the-art computer vision for smart retail by introducing efficient unsupervised anomaly detection techniques, effective zero-shot fine-grained classification frameworks, and a novel approach to zero-shot referring segmentation. The proposed methodologies not only address key limitations of existing systems but also provide practical solutions readily applicable to real-world retail settings.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Università degli Studi di Trento
			
	Corso di studio
	
				Industrial Innovation
			
	Data di pubblicazione
	
				27-giu-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Ricci, Elisa
			
	Nome Editore
	
				Università degli studi di Trento
			
	Città Editore
	
				TRENTO
			
	Numero di pagine
	
				153
			
	Collezione di appartenenza
	
				Università degli Studi di Trento

File in questo prodotto:

File	Dimensione	Formato
phd_unitn_Tur_AnilOsman.pdf accesso aperto Dimensione 19.8 MB Formato Adobe PDF Visualizza/Apri	19.8 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/214294

Il codice NBN di questa tesi è URN:NBN:IT:UNITN-214294