Credit Card Fraud detection using Binary Classification and ML.NET
It is not a hidden fact that we live in the digital era, which means everything is done online. This includes money management and the whole e-commerce area. So, the credit card fraud is often a problem that results in the loss of many important things. This problem can be prevented by using machine learning algorithms. Several methods exist in the literature to solve this problem, using Neural Networks, Logistic Regression, Naive Bayes, and Decision Trees.
In this report, I will focus on comparing them, but the focus will be on the binary classification methods. We will see the comparison on more than one data set. I tried to include real-world examples too to show the scalability.
In an era where everything happens fast, sometimes we forget to check our credit card security. So, for commodity we save the credit card on apps. This can lead to a credit card fraud. To solve a possible credit card fraud, we could use AI, but there are so many ways to do it. We could rely on Neural Networks, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, and Multilayer perception, but we will focus on the performance of binary classification methods, and we will compare it with other results.
Performance is key to any software since our world is going at top speed. How often do we meet this problem? This is a problem since in 2020 the pandemic gave everyone a hard time, but with the global lock-down the popularity of web shopping increased drastically, and this means that by the end of 2020 the credit card fraud could double, and in times like this, the last thing a person would want is to lose all his money.
The problem has piqued the interest of many incredible minds, and the solution to this problem would be to detect credit card fraud with decision trees and support vector machines as Y.Sahin and E.Duman would prove “as the size of the training data sets become larger, the accuracy performance of SVM based models reach the performance of the decision tree-based models.” But we are not talking about performance here. This was a problem way back. For example, in 1994 Gosh, Reilly created a neural network that would detect and classify an account as fraud with a higher success rate over rule-based procedures. Another idea came from Dhar and Buescher in 1996 that used historical data on cred card transaction to create a fraud score model and used a clustering approach on a radial basis function network. Other approaches used classic algorithms such as gradient boosting, decision trees, and logistic regression, all of which came with different results and will be compared in the following chapters.
Binary Classification Methods
As I said before there are many ways to solve this problem, but we will focus on the binary classification solutions since according to the paper Credit Card Fraud Detection the best results in terms of accuracy were binary classification methods. For example, random forests had an accuracy of 95.5%. The second place was a Decision Tree algorithm with 94.3%, and linear regression with 90%.
The dataset that was used is a popular one that can be downloaded from Kaggle and is made with the data of European cardholders of the year 2013. This dataset contains approx. 284,807 transaction and only 492 were labeled as fraud. The dataset is transformed using principal component analysis. And the variables V1….V28 represent PCA features, and the rest are considered non-PCA like time, amount, and class. Since one crucial aspect of the experimental results is the distribution ratio of classes, the data will need some preprocessing.
Not all features are useful, and if we keep them it may lead to overfitting so, we must carefully select the more important ones and remove the others to reduce the training time and improve the accuracy. To filter all the valuable features, Will Koehrsen’s tool was used, which led to reducing the number of valuable features by 95%. So only 27 features continued to the next phase. Because the data is highly imbalanced, a class distribution adjusting method is used. The most common ones are: oversampling the minority class, undersampling the majority class, or a hybrid between those two. A popular oversampling method that was used in both articles was SMOTE (Synthetic Minority Oversampling Technique) because it is highly effective when it comes to imbalanced datasets.
Binary Classification Methods used
Since the beginning of the article, our goal has been to see the performance on different binary classification methods. So in this article, they’ve built and trained the models, and we will compare the results and determine which one has better results in terms of precision and accuracy. Logistic regression describes a relationship between predictors that can be categorical, binary and continuous.
Depending on some predictors we determine if something will happen and decide the probability of belonging to each category of the given set of predictors. Naive Bayes is another supervised learning algorithm in which the attributes have no dependencies and are based on the Bayes theorem. In the experiment the Bernouli distribution was used for detecting fraudulent transactions. Decision trees are yet another supervised learning algorithm in which the structure is similar to real life tree, but there are three kinds of nodes: root node, intermediary node, and leaf node, the terminal node.
So, based on a set of factors, for a decision tree to make a correct classification, it will check a set of conditions at each level and navigate through the decision tree until it has the conclusion. A support vector machine is a supervised learning algorithm that trains on a set of data that is already classified into the correct categories and then tries to reconstruct the initial model, also it does all this by sorting the data. Random forests can be used for classification or regression, using a collection of decision trees for classification but outperforming them. The data set was split in an 80:20 ratio, 80% for training and 20% for testing.
As I said before we will focus on the performance of binary classification methods and compare them over accuracy and precision. The total sum of the samples is 56962; out of that number, 98 are fraudulent transactions.
Predicted as fraud: 1530
Actual fraud: 98
Predicted as not fraud: 55432
Actual not fraud: 56864
Predicted as fraud: 501
Actual fraud: 98
Predicted as not fraud: 56461
Actual not fraud: 56864
Predicted as fraud: 83
Actual fraud: 98
Predicted as not fraud: 56879
Actual not fraud: 56864
Support vector machine
As the paper suggests, the results prove that a classical approach can be as successful as the more popular choices like deep learning algorithms. And this idea is more detailed and supported by the articles. “ The findings of this study indicate promising results with SMOTE-based sampling techniques. The best recall score obtained was with SMOTE sampling strategy by DRF classifier at 0.81.”.
As we’ve seen the problem of cred fraud represents a real threat. Not to mention that this year we’ve also seen the introduction of applications that lets you pay with NFC which can be a huge problem for a person with the knowledge of credit card cloning. Several ways were proposed to combat this problem.
As we’ve seen with the experimental results, the classical algorithms are as successful as a deep learning method but only if we would pre-process the dataset with SMOTE strategy. The best-supervised learning algorithm in terms of precision was Support vector machine with a precision of 98.31% and in terms of accuracy random forest with an accuracy of 99.69%. And the previous remark “as the size of the training data sets become larger the accuracy performance of SVM-based models reach the performance of the decision tree-based models”.
The idea to use binary classification to solve this problem was also borrowed by Microsoft to develop a model that can be trained and consumed as an API in ML.NET. The algorithm they used was their innovative FastTree (which is a super-optimized boosted tree) and binary classification. I intend to analyze other papers that solved the exact problem using binary classification methods but to be sure, I will search for more up-to-date datasets that came from real banks across the world.