This thesis is about how we can use causality, in particular, in the form of structural causal models (SCM), to address fair machine learning (Fair ML) problems. We use SCM as auxiliary, declarative knowledge to contextualize and, in turn, enhance the formulation of such problems. We focus on automated decision-making (ADM) scenarios, in which a learned ML model, trained on past historical data, is tasked with predicting the outcomes of new incoming data. We address the following topics and applications: How can we use causal reasoning to better test for discrimination? The contribution to this question is twofold. First, we revisit the comparator used for testing the discrimination claim of a complainant. Finding (or generating) the comparator is at the center of all modeling tools for testing discrimination. We define two classes of comparators: the ceteris paribus(cp) comparator that represents an idealized comparison; and themutatis mutandis(mm) comparator that represents a “fairness given the difference” comparison. Second, we propose counterfactual situation testing (CST), a new algorithmic tool for testing discrimination that uses the mm-comparator. Using a k-NN implementation, we compare CST to its standard counterpart that uses the (cp) comparator.How can we use causal reasoning to operationalize subjective fairness? The contribution to this question is the causal perception (CP) framework, in which we use SCM to represent how individual agents interpret information. Perception occurs when two individual agents interpret the same information differently. It is largely overlooked in Fair ML since we often consider a single, objective view. With CP, we propose a partial, subjective problem formulation for Fair ML problems in which a set of decision-makers interpret and, in turn, decide differently on the same fairness problem. How can we use causal reasoning to mitigate the bias from using unrepresentative training data? We use SCM to formalize the problem of unrepresentative data, both as a sample selection bias and domain adaptation problem, and motivate the use of individual weights to correct for the bias. The contribution to this question is twofold with a focus on data science applications. First, we revisit partial dependence plots (PDP) and modify this visualization tool and propose the weighted PDP, or WPDP, as a solution. Under WPDP, the weights are used to correct for the contribution of each instance according to the underlying population distribution when drawing the plots. Second, we revisit the decision tree learning problem and propose a modification to the information gain split criterion, leading to what we define as domain adaptive decision trees (DADT). Under DADT, the entropy contribution for each instance when deciding the next split is weighted according to the target population distribution.
Causality for Fair Machine Learning: Selected Topics and Applications
ALVAREZ, Jose Manuel
2024
Abstract
This thesis is about how we can use causality, in particular, in the form of structural causal models (SCM), to address fair machine learning (Fair ML) problems. We use SCM as auxiliary, declarative knowledge to contextualize and, in turn, enhance the formulation of such problems. We focus on automated decision-making (ADM) scenarios, in which a learned ML model, trained on past historical data, is tasked with predicting the outcomes of new incoming data. We address the following topics and applications: How can we use causal reasoning to better test for discrimination? The contribution to this question is twofold. First, we revisit the comparator used for testing the discrimination claim of a complainant. Finding (or generating) the comparator is at the center of all modeling tools for testing discrimination. We define two classes of comparators: the ceteris paribus(cp) comparator that represents an idealized comparison; and themutatis mutandis(mm) comparator that represents a “fairness given the difference” comparison. Second, we propose counterfactual situation testing (CST), a new algorithmic tool for testing discrimination that uses the mm-comparator. Using a k-NN implementation, we compare CST to its standard counterpart that uses the (cp) comparator.How can we use causal reasoning to operationalize subjective fairness? The contribution to this question is the causal perception (CP) framework, in which we use SCM to represent how individual agents interpret information. Perception occurs when two individual agents interpret the same information differently. It is largely overlooked in Fair ML since we often consider a single, objective view. With CP, we propose a partial, subjective problem formulation for Fair ML problems in which a set of decision-makers interpret and, in turn, decide differently on the same fairness problem. How can we use causal reasoning to mitigate the bias from using unrepresentative training data? We use SCM to formalize the problem of unrepresentative data, both as a sample selection bias and domain adaptation problem, and motivate the use of individual weights to correct for the bias. The contribution to this question is twofold with a focus on data science applications. First, we revisit partial dependence plots (PDP) and modify this visualization tool and propose the weighted PDP, or WPDP, as a solution. Under WPDP, the weights are used to correct for the contribution of each instance according to the underlying population distribution when drawing the plots. Second, we revisit the decision tree learning problem and propose a modification to the information gain split criterion, leading to what we define as domain adaptive decision trees (DADT). Under DADT, the entropy contribution for each instance when deciding the next split is weighted according to the target population distribution.File | Dimensione | Formato | |
---|---|---|---|
Tesi.pdf
accesso aperto
Dimensione
4.2 MB
Formato
Adobe PDF
|
4.2 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/190088
URN:NBN:IT:SNS-190088