Human-robot interaction (HRI) is a rapidly evolving domain focused on the interaction between humans and robots, exploring robotic systems’ design, function- ality, and social implications in various environments. Virtual Reality (VR) has emerged as a valuable tool for evaluating HRI solutions before real-world deployment, ensuring safety and scalability. The main objective of this thesis is to present a multimodal interaction framework integrating speech and gesture recognition to enhance collaboration between humans and robots in precision agriculture, particularly in table-grape vineyards under the CANOPIES project, as well as in broader contexts of indoor and outdoor logistics. In collaborative robotics, where human-robot collaboration (HRC) is essential, multimodal communication between humans and robots is crucial during each interaction. To address this challenge, building on a categorization of the information content and speech act classification in the context of HRI in shared environments, Speech and Gesture recognition pipelines were designed and integrated into HRI architecture for cobots. Leveraging virtual reality (VR) as a testbed, the work generates synthetic datasets to train robust gesture and speech recognition models, overcoming the scarcity of real-world data in agricultural contexts. The framework is empirically validated through VR-based user studies and field experiments, demonstrating improved communication reliability in noisy vineyard environments and reduced task completion times. Notably, the system emphasizes modularity, allowing interchangeable components (e.g., pose estimators and speech classifiers) to adapt to dynamic tasks. Key contributions include (i) a standardized gesture taxonomy tailored to agricultural workflows, (ii) open-source datasets produced from both real and synthetic sources, (iii) a synthetic data generation pipeline for pose estimation, and (iv) a multimodal communication architecture augmented by large language models (LLMs) for contextual reasoning using limited computational capacity in agricultural logistics. By bridging virtual simulations and real-world deployment, this research advances human-robot collaboration in precision agriculture, offering interactive solutions for harvesting, pruning, and logistics tasks. The findings underscore the potential of multimodal HRI and immersive technologies to address collaboration between human expertise and robots and enhance safety and efficiency across both indoor and outdoor collaborative environments.
L'interazione uomo-robot (HRI) è un ambito in rapida evoluzione che si concentra sull'interazione tra esseri umani e robot, esplorando la progettazione, la funzionalità e le implicazioni sociali dei sistemi robotici in diversi contesti. La realtà virtuale (VR) si è affermata come uno strumento prezioso per valutare le soluzioni HRI prima della loro implementazione nel mondo reale, garantendo sicurezza e scalabilità. L'obiettivo principale di questa tesi è presentare un framework di interazione multimodale che integri il riconoscimento vocale e gestuale per migliorare la collaborazione tra esseri umani e robot nell’agricoltura di precisione, in particolare nei vigneti di uva da tavola nell'ambito del progetto CANOPIES, così come in contesti più ampi di logistica interna ed esterna. Nella robotica collaborativa, in cui la collaborazione uomo-robot (HRC) è essenziale, la comunicazione multimodale tra esseri umani e robot riveste un ruolo cruciale in ogni interazione. Per affrontare questa sfida, basandosi su una categorizzazione del contenuto informativo e sulla classificazione degli atti linguistici nel contesto dell’HRI in ambienti condivisi, sono state progettate pipeline di riconoscimento vocale e gestuale, successivamente integrate nell’architettura HRI per i cobot. L’utilizzo della realtà virtuale (VR) come banco di prova consente di generare set di dati sintetici per l’addestramento di modelli robusti di riconoscimento di gesti e parlato, superando così la scarsità di dati reali nei contesti agricoli. Il framework è stato validato empiricamente attraverso studi utente basati sulla VR ed esperimenti sul campo, dimostrando una maggiore affidabilità della comunicazione in ambienti rumorosi come i vigneti e una riduzione dei tempi di completamento delle attività. In particolare, il sistema enfatizza la modularità, permettendo ai componenti intercambiabili (ad esempio, stimatori di pose e classificatori del parlato) di adattarsi a compiti dinamici. I principali contributi della ricerca includono: (i) una tassonomia standardizzata dei gesti adattata ai flussi di lavoro agricoli, (ii) set di dati open-source generati da fonti sia reali che sintetiche, (iii) una pipeline per la generazione di dati sintetici finalizzata alla stima delle pose e (iv) un’architettura di comunicazione multimodale potenziata da modelli linguistici di grandi dimensioni (LLM) per il ragionamento contestuale, con un utilizzo limitato delle risorse computazionali nella logistica agricola. Collegando simulazioni virtuali e implementazioni nel mondo reale, questa ricerca promuove la collaborazione uomo-robot nell’agricoltura di precisione, offrendo soluzioni interattive per attività di raccolta, potatura e logistica. I risultati evidenziano il potenziale dell’HRI multimodale e delle tecnologie immersive nel favorire la collaborazione tra esseri umani e robot, migliorando al contempo sicurezza ed efficienza in ambienti collaborativi sia interni che esterni.
Multimodal communication for enhancing human robot interaction: virtual simulations to real robots
SABBELLA, SANDEEP REDDY
2025
Abstract
Human-robot interaction (HRI) is a rapidly evolving domain focused on the interaction between humans and robots, exploring robotic systems’ design, function- ality, and social implications in various environments. Virtual Reality (VR) has emerged as a valuable tool for evaluating HRI solutions before real-world deployment, ensuring safety and scalability. The main objective of this thesis is to present a multimodal interaction framework integrating speech and gesture recognition to enhance collaboration between humans and robots in precision agriculture, particularly in table-grape vineyards under the CANOPIES project, as well as in broader contexts of indoor and outdoor logistics. In collaborative robotics, where human-robot collaboration (HRC) is essential, multimodal communication between humans and robots is crucial during each interaction. To address this challenge, building on a categorization of the information content and speech act classification in the context of HRI in shared environments, Speech and Gesture recognition pipelines were designed and integrated into HRI architecture for cobots. Leveraging virtual reality (VR) as a testbed, the work generates synthetic datasets to train robust gesture and speech recognition models, overcoming the scarcity of real-world data in agricultural contexts. The framework is empirically validated through VR-based user studies and field experiments, demonstrating improved communication reliability in noisy vineyard environments and reduced task completion times. Notably, the system emphasizes modularity, allowing interchangeable components (e.g., pose estimators and speech classifiers) to adapt to dynamic tasks. Key contributions include (i) a standardized gesture taxonomy tailored to agricultural workflows, (ii) open-source datasets produced from both real and synthetic sources, (iii) a synthetic data generation pipeline for pose estimation, and (iv) a multimodal communication architecture augmented by large language models (LLMs) for contextual reasoning using limited computational capacity in agricultural logistics. By bridging virtual simulations and real-world deployment, this research advances human-robot collaboration in precision agriculture, offering interactive solutions for harvesting, pruning, and logistics tasks. The findings underscore the potential of multimodal HRI and immersive technologies to address collaboration between human expertise and robots and enhance safety and efficiency across both indoor and outdoor collaborative environments.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi_dottorato_Sabbella.pdf
accesso aperto
Licenza:
Creative Commons
Dimensione
39.28 MB
Formato
Adobe PDF
|
39.28 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/353651
URN:NBN:IT:UNIROMA1-353651