This thesis examines how Responsible AI principles—safety, ethics, and transparency—can be effectively embedded into modern AI models. As large-scale systems like deep-fake generation and autonomous navigation grow increasingly pervasive, aligning these technologies with societal values, ethical standards, and user privacy becomes imperative. This research tackles these challenges through a series of interrelated contributions. In the domain of deepfake detection and explainability, robust methods were developed using self-supervised models such as DINO to identify and classify synthetic images, including those generated by text-to-image diffusion models, even under adversarial conditions. By introducing visual explainability cues, this work enhanced user trust by identifying specific artifacts indicative of deepfake content. For explainable navigation in embodied AI, a framework was designed to improve transparency in autonomous systems. By integrating a speaker policy and captioning module into a self-supervised exploration agent, the system generated natural language descriptions of its navigational context. The introduction of an explanation map metric ensured better alignment between visual attention and textual recounting, supporting human-robot collaboration. In the area of machine unlearning, this thesis introduced a low-rank unlearning method to remove specific classes or examples from pre-trained models without requiring full access to the original dataset. This approach was extended to enable efficient, on-demand removal of multiple classes during inference, minimizing computational and storage demands while maintaining model effectiveness. To address unsafe content in vision-and-language models, the research introduced Safe-CLIP, a fine-tuned version of CLIP, capable of filtering NSFW content. The development of ViSU, a dataset of safe and unsafe image-text pairs, supported this effort. Safe-CLIP redirected unsafe regions of the embedding space, achieving a balance between minimizing harmful outputs and retaining benign creative functionality. Finally, the robustness of multilingual large language models (LLMs) in the context of safety was investigated. It was found that fine-tuning attacks in one language could compromise safety across all languages, revealing vulnerabilities in these models. To address this, the Safety Information Localization method identified safety-critical parameters, paving the way for more robust alignment practices. Together, these contributions provide both theoretical insights and practical solutions to enhance the reliability, adaptability, and ethics of AI systems. By addressing challenges such as safer navigation, efficient unlearning, and robust NSFW filtering, this research advances the alignment of large-scale AI models with Responsible AI principles.

Responsible AI in Vision and Language: Ensuring Safety, Ethics, and Transparency in Modern Models

POPPI, SAMUELE
2025

Abstract

This thesis examines how Responsible AI principles—safety, ethics, and transparency—can be effectively embedded into modern AI models. As large-scale systems like deep-fake generation and autonomous navigation grow increasingly pervasive, aligning these technologies with societal values, ethical standards, and user privacy becomes imperative. This research tackles these challenges through a series of interrelated contributions. In the domain of deepfake detection and explainability, robust methods were developed using self-supervised models such as DINO to identify and classify synthetic images, including those generated by text-to-image diffusion models, even under adversarial conditions. By introducing visual explainability cues, this work enhanced user trust by identifying specific artifacts indicative of deepfake content. For explainable navigation in embodied AI, a framework was designed to improve transparency in autonomous systems. By integrating a speaker policy and captioning module into a self-supervised exploration agent, the system generated natural language descriptions of its navigational context. The introduction of an explanation map metric ensured better alignment between visual attention and textual recounting, supporting human-robot collaboration. In the area of machine unlearning, this thesis introduced a low-rank unlearning method to remove specific classes or examples from pre-trained models without requiring full access to the original dataset. This approach was extended to enable efficient, on-demand removal of multiple classes during inference, minimizing computational and storage demands while maintaining model effectiveness. To address unsafe content in vision-and-language models, the research introduced Safe-CLIP, a fine-tuned version of CLIP, capable of filtering NSFW content. The development of ViSU, a dataset of safe and unsafe image-text pairs, supported this effort. Safe-CLIP redirected unsafe regions of the embedding space, achieving a balance between minimizing harmful outputs and retaining benign creative functionality. Finally, the robustness of multilingual large language models (LLMs) in the context of safety was investigated. It was found that fine-tuning attacks in one language could compromise safety across all languages, revealing vulnerabilities in these models. To address this, the Safety Information Localization method identified safety-critical parameters, paving the way for more robust alignment practices. Together, these contributions provide both theoretical insights and practical solutions to enhance the reliability, adaptability, and ethics of AI systems. By addressing challenges such as safer navigation, efficient unlearning, and robust NSFW filtering, this research advances the alignment of large-scale AI models with Responsible AI principles.
14-mag-2025
Italiano
Responsible AI
AI Safety
GenAI
Machine Unlearning
Model Interpretability
Multimodal AI
Large Language Models (LLMs)
Multilingual Alignment
Cucchiara, Rita
Baraldi, Lorenzo
File in questo prodotto:
File Dimensione Formato  
FinalReport_POPPI_SAMUELE_apr24_signed_pdfa.pdf

non disponibili

Dimensione 217.68 kB
Formato Adobe PDF
217.68 kB Adobe PDF
Poppi_PhD_AI_Thesis_13_1.pdf

accesso aperto

Dimensione 29.67 MB
Formato Adobe PDF
29.67 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216816
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216816