Explainability in Federated Learning and Selective Classification

Bonsignori, Valerio

Modern Machine Learning has achieved remarkable predictive accuracy, yet its wide deployment demands interpretability alongside performance. While the field of Explainable Artificial Intelligence has matured considerably, offering sophisticated methods from SHAP to counterfactual explanations, these techniques often assume centralised data access and deterministic predictions. This thesis addresses the gap in extending explainability methodologies to complex deployment contexts where such assumptions do not hold. This work explores two paradigmatic challenges in modern Artificial Intelligence deployment. First, Federated Learning environments where privacy constraints do not allow data centralisation, making traditional explanation methods that require data or IID feature distributions incompatible with distributed architectures. Second, selective classification systems where models can abstain from predictions, requiring explanations not only for the outcome but also for the abstention. These contexts represent the norm as Artificial Intelligence operates under regulatory constraints, privacy requirements, safety and ethical considerations. The thesis presents three main contributions addressing these challenges. iFLASH (Interpretable Federated Learning Aggregation of SHAP values) enables high-quality SHAP explanations in federated settings by having each client generate local explanations using their private data, then smartly aggregating these at the server using faithfulness-based weighting strategies. Experiments across multiple datasets demonstrate that federated explanations can match or exceed centralised quality, with the aggregation strategy consistently outperforming naive averaging, particularly in cross-silo scenarios. Fastshap++ extends this work by training neural explainers directly in federated settings with differential privacy guarantees. Rather than aggregating explanations, clients jointly train surrogate and explainer networks, sharing only model weights. The integration of differential privacy throughout the pipeline ensures formal privacy protection while maintaining explanation faithfulness comparable to centralised approaches. SC-CE (Selective Classification via Counterfactual Explanations) addresses the challenge of explainable abstention. By using the distance between instances and their counterfactuals as a confidence proxy, the method creates an interpretable-by-design rejection policy where the counterfactual explains why the model abstained. Experimental validation shows SC-CE matches state-of-the-art selective classifiers in predictive performance while uniquely providing human-interpretable explanations for rejection decisions. The experimental validations, spanning five datasets, and various model architectures, establish that explanation quality can be maintained or even enhanced when working within real-world constraints. This thesis enriches XAI theory and practical deployment, enabling trustworthy Artificial Intelligence systems that can justify their decisions even when data cannot be centralised or when abstention is the most responsible choice.

Explainability in Federated Learning and Selective Classification

BONSIGNORI, VALERIO

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)