A GAME THEORETIC BASED APPROACH TO FEATURE SELECTION FOR EFFICIENT MULTI-CRITERIA DECISION MAKING: SOME CREDIT CARD FRAUD DETECTION USE CASES

Ghemmogne Fossi, Leopold

This dissertation, entitled "Power-index based Management of Fraud Detection Rules: Supervised and Semi-supervised Approaches", deals with credit card fraud detection. According to the European Central Bank, the value of fraud using cards issued in the Single Euro Payments Area (SEPA) amounted to €1.8 billion in 2016. It is, therefore, a big challenge about reducing fraud on credit cards. In general, fraud detection systems consist of an automatic system made by extit{if-then-else} rules which control any transaction and trigger an alert when the transaction is considered as suspicious. Then human experts check the alert and decide whether the alert is a true or false positive. The criteria used to select the rules to be kept operational are traditionally based mostly on the performance of individual rules. This approach indeed disregards the non-additivity of the rules. We propose a novel approach using power indices developed within Coalitional Game Theory (CGT). This approach assigns to the rules a normalized score which quantifies the rule influence on the overall performance of the pool. As indices, we use Shapley Value (SV) and Banzhaf Value (BV). The main applications of such scores are: 1) the support of the decision of whether to keep or drop a rule from the pool; 2) the selection of the k top-ranked rules, so as to work with a more compact rule-set. Using real-world credit card fraud data containing approximately 300 rules and 3.5 X 10^5 transaction records, we show that: 1) This approach fare better in granting the performance of the pool than the one assessing the rules in isolation. 2) The performance of the whole pool can be achieved, keeping only one-tenth of the rules. We then observe that the latter application can be re-framed in terms of a Feature Selection (FS) task for a classifier: we show that our approach is comparable w.r.t benchmark FS algorithms. Also, we observe that it presents an advantage for rule management, consisting of the assignment of a normalized score to each rule. This is not the case for most FS algorithms, which only focus on yielding a high-performance feature-set solution. In another contribution, we propose a new version of Banzhaf Value, i.e., k-Banzhaf; this version outperforms concerning the original one in terms of computation time and has comparable performance. While for a set N of N elements, the normal Banzhaf computes 2^N - 1 differences, the k-Banzhaf computes only binom{n-1}{k-1} differences. Finally, we implement a self-training process ( a kind of bootstrap) to reinforce the learning process in a machine learning algorithm (Random Forest Classifier). We compare the latter with our three power indices to perform classification on the real-world credit card fraud data used in the first part of the manuscript. As a result, we observe that power indices-based feature selection has comparable results w.r.t benchmark FS algorithms also in self-training process.

A GAME THEORETIC BASED APPROACH TO FEATURE SELECTION FOR EFFICIENT MULTI-CRITERIA DECISION MAKING: SOME CREDIT CARD FRAUD DETECTION USE CASES

GHEMMOGNE FOSSI, LEOPOLD

2018

Abstract

This dissertation, entitled "Power-index based Management of Fraud Detection Rules: Supervised and Semi-supervised Approaches", deals with credit card fraud detection. According to the European Central Bank, the value of fraud using cards issued in the Single Euro Payments Area (SEPA) amounted to €1.8 billion in 2016. It is, therefore, a big challenge about reducing fraud on credit cards. In general, fraud detection systems consist of an automatic system made by extit{if-then-else} rules which control any transaction and trigger an alert when the transaction is considered as suspicious. Then human experts check the alert and decide whether the alert is a true or false positive. The criteria used to select the rules to be kept operational are traditionally based mostly on the performance of individual rules. This approach indeed disregards the non-additivity of the rules. We propose a novel approach using power indices developed within Coalitional Game Theory (CGT). This approach assigns to the rules a normalized score which quantifies the rule influence on the overall performance of the pool. As indices, we use Shapley Value (SV) and Banzhaf Value (BV). The main applications of such scores are: 1) the support of the decision of whether to keep or drop a rule from the pool; 2) the selection of the k top-ranked rules, so as to work with a more compact rule-set. Using real-world credit card fraud data containing approximately 300 rules and 3.5 X 10^5 transaction records, we show that: 1) This approach fare better in granting the performance of the pool than the one assessing the rules in isolation. 2) The performance of the whole pool can be achieved, keeping only one-tenth of the rules. We then observe that the latter application can be re-framed in terms of a Feature Selection (FS) task for a classifier: we show that our approach is comparable w.r.t benchmark FS algorithms. Also, we observe that it presents an advantage for rule management, consisting of the assignment of a normalized score to each rule. This is not the case for most FS algorithms, which only focus on yielding a high-performance feature-set solution. In another contribution, we propose a new version of Banzhaf Value, i.e., k-Banzhaf; this version outperforms concerning the original one in terms of computation time and has comparable performance. While for a set N of N elements, the normal Banzhaf computes 2^N - 1 differences, the k-Banzhaf computes only binom{n-1}{k-1} differences. Finally, we implement a self-training process ( a kind of bootstrap) to reinforce the learning process in a machine learning algorithm (Random Forest Classifier). We compare the latter with our three power indices to perform classification on the real-world credit card fraud data used in the first part of the manuscript. As a result, we observe that power indices-based feature selection has comparable results w.r.t benchmark FS algorithms also in self-training process.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INFORMATICA "Giovanni Degli Antoni"
			
	Corso di studio
	
				INFORMATICA
			
	Data di pubblicazione
	
				2018
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Credit card fraud detection; Coalitional Game Theory; Power Indexes; Shapley Value; Banzhaf Index; Restricted Banzhaf Index; Semi-supervised Learning; Supervised Learning; Self-training
			
	Relatore, Supervisor, Advisor o Tutor
	
				DAMIANI, ERNESTO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				GIANINI, GABRIELE
DAMIANI, ERNESTO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R11288.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 2.42 MB Formato Adobe PDF Visualizza/Apri	2.42 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/79614

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-79614