This dissertation, entitled "Power-index based Management of Fraud Detection Rules: Supervised and Semi-supervised Approaches", deals with credit card fraud detection. According to the European Central Bank, the value of fraud using cards issued in the Single Euro Payments Area (SEPA) amounted to €1.8 billion in 2016. It is, therefore, a big challenge about reducing fraud on credit cards. In general, fraud detection systems consist of an automatic system made by extit{if-then-else} rules which control any transaction and trigger an alert when the transaction is considered as suspicious. Then human experts check the alert and decide whether the alert is a true or false positive. The criteria used to select the rules to be kept operational are traditionally based mostly on the performance of individual rules. This approach indeed disregards the non-additivity of the rules. We propose a novel approach using power indices developed within Coalitional Game Theory (CGT). This approach assigns to the rules a normalized score which quantifies the rule influence on the overall performance of the pool. As indices, we use Shapley Value (SV) and Banzhaf Value (BV). The main applications of such scores are: 1) the support of the decision of whether to keep or drop a rule from the pool; 2) the selection of the k top-ranked rules, so as to work with a more compact rule-set. Using real-world credit card fraud data containing approximately 300 rules and 3.5 X 10^5 transaction records, we show that: 1) This approach fare better in granting the performance of the pool than the one assessing the rules in isolation. 2) The performance of the whole pool can be achieved, keeping only one-tenth of the rules. We then observe that the latter application can be re-framed in terms of a Feature Selection (FS) task for a classifier: we show that our approach is comparable w.r.t benchmark FS algorithms. Also, we observe that it presents an advantage for rule management, consisting of the assignment of a normalized score to each rule. This is not the case for most FS algorithms, which only focus on yielding a high-performance feature-set solution. In another contribution, we propose a new version of Banzhaf Value, i.e., k-Banzhaf; this version outperforms concerning the original one in terms of computation time and has comparable performance. While for a set N of N elements, the normal Banzhaf computes 2^N - 1 differences, the k-Banzhaf computes only binom{n-1}{k-1} differences. Finally, we implement a self-training process ( a kind of bootstrap) to reinforce the learning process in a machine learning algorithm (Random Forest Classifier). We compare the latter with our three power indices to perform classification on the real-world credit card fraud data used in the first part of the manuscript. As a result, we observe that power indices-based feature selection has comparable results w.r.t benchmark FS algorithms also in self-training process.
A GAME THEORETIC BASED APPROACH TO FEATURE SELECTION FOR EFFICIENT MULTI-CRITERIA DECISION MAKING: SOME CREDIT CARD FRAUD DETECTION USE CASES
GHEMMOGNE FOSSI, LEOPOLD
2018
Abstract
This dissertation, entitled "Power-index based Management of Fraud Detection Rules: Supervised and Semi-supervised Approaches", deals with credit card fraud detection. According to the European Central Bank, the value of fraud using cards issued in the Single Euro Payments Area (SEPA) amounted to €1.8 billion in 2016. It is, therefore, a big challenge about reducing fraud on credit cards. In general, fraud detection systems consist of an automatic system made by extit{if-then-else} rules which control any transaction and trigger an alert when the transaction is considered as suspicious. Then human experts check the alert and decide whether the alert is a true or false positive. The criteria used to select the rules to be kept operational are traditionally based mostly on the performance of individual rules. This approach indeed disregards the non-additivity of the rules. We propose a novel approach using power indices developed within Coalitional Game Theory (CGT). This approach assigns to the rules a normalized score which quantifies the rule influence on the overall performance of the pool. As indices, we use Shapley Value (SV) and Banzhaf Value (BV). The main applications of such scores are: 1) the support of the decision of whether to keep or drop a rule from the pool; 2) the selection of the k top-ranked rules, so as to work with a more compact rule-set. Using real-world credit card fraud data containing approximately 300 rules and 3.5 X 10^5 transaction records, we show that: 1) This approach fare better in granting the performance of the pool than the one assessing the rules in isolation. 2) The performance of the whole pool can be achieved, keeping only one-tenth of the rules. We then observe that the latter application can be re-framed in terms of a Feature Selection (FS) task for a classifier: we show that our approach is comparable w.r.t benchmark FS algorithms. Also, we observe that it presents an advantage for rule management, consisting of the assignment of a normalized score to each rule. This is not the case for most FS algorithms, which only focus on yielding a high-performance feature-set solution. In another contribution, we propose a new version of Banzhaf Value, i.e., k-Banzhaf; this version outperforms concerning the original one in terms of computation time and has comparable performance. While for a set N of N elements, the normal Banzhaf computes 2^N - 1 differences, the k-Banzhaf computes only binom{n-1}{k-1} differences. Finally, we implement a self-training process ( a kind of bootstrap) to reinforce the learning process in a machine learning algorithm (Random Forest Classifier). We compare the latter with our three power indices to perform classification on the real-world credit card fraud data used in the first part of the manuscript. As a result, we observe that power indices-based feature selection has comparable results w.r.t benchmark FS algorithms also in self-training process.File | Dimensione | Formato | |
---|---|---|---|
phd_unimi_R11288.pdf
accesso aperto
Dimensione
2.42 MB
Formato
Adobe PDF
|
2.42 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/79614
URN:NBN:IT:UNIMI-79614