Understanding the functioning of current AI models is an urgent open problem, due to the massive scale use of deep neural networks and their black-box nature. Several works address how to explain the behavior of AI models and greater interest is growing to provide explanations with high-level variables, often called concepts or symbols. Leveraging concepts as a vehicle for explaining AI models allows to discard irrelevant information and focus only on the semantic content of the data. This has the potential to make models more interpretable, and achieve higher trust in their decision-making process. One key open problem is how to learn concepts from data such that they possess the correct semantics. This thesis analyzes this prob- lem in depth, presenting two major contributions. The first is explaining and addressing pitfalls in learning the right concepts in the context of tasks that involve reasoning on them. These pitfalls are due to Reasoning Shortcuts, whereby models can leverage poor-quality concepts to attain correct predictions. The second contribution is establishing a formal framework to test the quality of concepts learned by the model, and successively presenting a class of models that boost concept quality by leveraging advanced representation learning techniques. Overall, the presented works contribute to further understanding the issues complicating provably learning the concepts from data and to designing more trustworthy AI models for future high-stakes applications.
Learning Concepts with the Right Semantics: Reasoning Shortcuts and Human-Machine Alignment
MARCONATO, EMANUELE
2025
Abstract
Understanding the functioning of current AI models is an urgent open problem, due to the massive scale use of deep neural networks and their black-box nature. Several works address how to explain the behavior of AI models and greater interest is growing to provide explanations with high-level variables, often called concepts or symbols. Leveraging concepts as a vehicle for explaining AI models allows to discard irrelevant information and focus only on the semantic content of the data. This has the potential to make models more interpretable, and achieve higher trust in their decision-making process. One key open problem is how to learn concepts from data such that they possess the correct semantics. This thesis analyzes this prob- lem in depth, presenting two major contributions. The first is explaining and addressing pitfalls in learning the right concepts in the context of tasks that involve reasoning on them. These pitfalls are due to Reasoning Shortcuts, whereby models can leverage poor-quality concepts to attain correct predictions. The second contribution is establishing a formal framework to test the quality of concepts learned by the model, and successively presenting a class of models that boost concept quality by leveraging advanced representation learning techniques. Overall, the presented works contribute to further understanding the issues complicating provably learning the concepts from data and to designing more trustworthy AI models for future high-stakes applications.File | Dimensione | Formato | |
---|---|---|---|
FinalReport_Marconato_pdfa.pdf
non disponibili
Dimensione
239.34 kB
Formato
Adobe PDF
|
239.34 kB | Adobe PDF | |
frontespizio_firma_digitale_pdfa.pdf
non disponibili
Dimensione
59.62 kB
Formato
Adobe PDF
|
59.62 kB | Adobe PDF | |
phd_thesis_marconato_final_2.pdf
accesso aperto
Dimensione
22.25 MB
Formato
Adobe PDF
|
22.25 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/215662
URN:NBN:IT:UNIPI-215662