Video anomaly detection: ensuring the safety of human actions and street scenes

D'Arrigo, Stefano

Artificial Intelligence, particularly Computer Vision, holds immense potential to enhance human safety and advance society's digital transition. This thesis addresses the challenges of developing robust and efficient AI for complex, human-centered tasks, spanning from behavior monitoring to driving scenes. We analyze the task of Video Anomaly Detection and its related applications in human action monitoring, crowd occupancy estimation, and out-of-distribution detection in street scenes. For human action monitoring, we propose two novel methods. COSKAD demonstrates the critical impact of latent space geometry on learning representations of expected human actions, proving that low-dimensional vectors can effectively embed complex spatio-temporal dependencies. MoCoDAD advances this by estimating the latent distribution of human motion, leveraging an action's inherent variability to robustly distinguish normal from abnormal behavior. Shifting from individual to group dynamics, STEERER-V introduces a method to precisely estimate a crowd's space occupancy, and by proxy its weight, directly from 2D RGB images. This approach bypasses computationally expensive intermediate steps and is accompanied by ANTHROPOS-V, a new benchmark to spur further research in this domain. Finally, to enhance the reliability of self-driving systems, CMS-OoD presents a cross-modal steering technique. It efficiently adapts a large Vision-Language Model to condition a semantic segmentation task model, significantly improving OOD detection. As a key benefit, this method also generates grounded textual explanations of the observed scene, fostering safer, more interpretable human-vehicle interaction. Collectively, these contributions demonstrate that through geometric priors, distributional assumptions, or cross-modal conditioning, we can develop AI systems that are more robust, efficient, and better aligned with human needs in complex environments.

Video anomaly detection: ensuring the safety of human actions and street scenes

D'ARRIGO, STEFANO

2026

Abstract

Artificial Intelligence, particularly Computer Vision, holds immense potential to enhance human safety and advance society's digital transition. This thesis addresses the challenges of developing robust and efficient AI for complex, human-centered tasks, spanning from behavior monitoring to driving scenes. We analyze the task of Video Anomaly Detection and its related applications in human action monitoring, crowd occupancy estimation, and out-of-distribution detection in street scenes. For human action monitoring, we propose two novel methods. COSKAD demonstrates the critical impact of latent space geometry on learning representations of expected human actions, proving that low-dimensional vectors can effectively embed complex spatio-temporal dependencies. MoCoDAD advances this by estimating the latent distribution of human motion, leveraging an action's inherent variability to robustly distinguish normal from abnormal behavior. Shifting from individual to group dynamics, STEERER-V introduces a method to precisely estimate a crowd's space occupancy, and by proxy its weight, directly from 2D RGB images. This approach bypasses computationally expensive intermediate steps and is accompanied by ANTHROPOS-V, a new benchmark to spur further research in this domain. Finally, to enhance the reliability of self-driving systems, CMS-OoD presents a cross-modal steering technique. It efficiently adapts a large Vision-Language Model to condition a semantic segmentation task model, significantly improving OOD detection. As a key benefit, this method also generates grounded textual explanations of the observed scene, fostering safer, more interpretable human-vehicle interaction. Collectively, these contributions demonstrate that through geometric priors, distributional assumptions, or cross-modal conditioning, we can develop AI systems that are more robust, efficient, and better aligned with human needs in complex environments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INFORMATICA
DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Altro corso di dottorato
			
	Data di pubblicazione
	
				28-gen-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				GALASSO, FABIO
SPINELLI, INDRO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				GRISETTI, GIORGIO
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				98
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_DArrigo.pdf accesso aperto Licenza: Creative Commons Dimensione 42.17 MB Formato Adobe PDF Visualizza/Apri	42.17 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/358530

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-358530