Recently, data mining has been deemed to be an effective means for disclosing evidences and hidden causes of discrimination. If data mining succeeds in finding associations proving the fact that discriminatory treatments has strong relations with sensitive attributes, discrimination is obviously irrefutable. In this thesis, I propose a modified approach of the traditional data mining process to unveil and represent discrimination in a “rich semantic” form for semi-structured business data with multiple-valued treatments based on support from ontology. First, input data are preprocessed to be well-structured with semantic relations, which considerably support discrimination exploration later. The framework then seeks possibly discriminatory relations between the unequal treatments and protected-by-law attributes, e.g., race, religion, sex. These discriminatory relations will be represented in the form of association rules through the notion of matching pairs of itemsets with different sensitive attributes and equal non-sensitive ones that are subject to different treatments. By combining data mining and reasoning service over the ontology, the achieved rules are semantically enriched by object properties between classes (concepts). Thus, they are more valuable and interesting than the flat association rules. In order to address the drawback of local knowledge, the solution of “kNN as Situation Testing” is provided. Besides, a number of measures of discrimination are provided for the purpose of quantifying the level of discrimination to obtain a precise vision of how different sensitive attributes negatively affect the decision and even on each other. Experimental results confirm the potential and flexibility of the approach.
Generalized discrimination discovery on semi-structured data supported by ontology
2011
Abstract
Recently, data mining has been deemed to be an effective means for disclosing evidences and hidden causes of discrimination. If data mining succeeds in finding associations proving the fact that discriminatory treatments has strong relations with sensitive attributes, discrimination is obviously irrefutable. In this thesis, I propose a modified approach of the traditional data mining process to unveil and represent discrimination in a “rich semantic” form for semi-structured business data with multiple-valued treatments based on support from ontology. First, input data are preprocessed to be well-structured with semantic relations, which considerably support discrimination exploration later. The framework then seeks possibly discriminatory relations between the unequal treatments and protected-by-law attributes, e.g., race, religion, sex. These discriminatory relations will be represented in the form of association rules through the notion of matching pairs of itemsets with different sensitive attributes and equal non-sensitive ones that are subject to different treatments. By combining data mining and reasoning service over the ontology, the achieved rules are semantically enriched by object properties between classes (concepts). Thus, they are more valuable and interesting than the flat association rules. In order to address the drawback of local knowledge, the solution of “kNN as Situation Testing” is provided. Besides, a number of measures of discrimination are provided for the purpose of quantifying the level of discrimination to obtain a precise vision of how different sensitive attributes negatively affect the decision and even on each other. Experimental results confirm the potential and flexibility of the approach.File | Dimensione | Formato | |
---|---|---|---|
Luong_Thanh_phdthesis.pdf
accesso aperto
Tipologia:
Altro materiale allegato
Dimensione
4.53 MB
Formato
Adobe PDF
|
4.53 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/144173
URN:NBN:IT:IMTLUCCA-144173