PREDICTION OF CREDIT CARD PAYMENT NEXT MONTH THROUGH TREE NET DATA MINING TECHNIQUES

A number of research initiatives have recently been launched around the world regarding the conceptualization, specification, design and development principles of the future use of credit cards, storing secret information on them, while most time we use them for online payment. In addition, if it has enough money, we can pay for what we need at any time. Therefore, the goal of this proposed research is to use data mining techniques to predict credit card payment next month. Our proposed system contains five steps: (a) find the suitable database from the internet because this database is not available in Iraq, (b) pre-process the credit card database based on person correlation matrix to determine which feature is less correlated with other to remove it and reduce the time of prediction, (c) split pre-processing database into two parts training and testing dataset, (d) apply TreeNet prediction data mining techniques (TPDMT) on training dataset to test if we need payment next month or do not, find the optimal tree. TreeNet based on Boosting Machine usually makes the predictor to use Decision Trees (DTs). (e ) Finally, pass the testing dataset on the optimal tree results from TPDMT, then using the five measures related to confusion matrix to evaluate the results including “Accuracy (AC), recall or true positive rate (TP), precision (P), F-measure (considers both precision and recall) and Fb”.


INTRODUCTION
Without cash, credit card is better and more suitable way to sell services. Credit card is a small card used for payments and purchases; many opportunities for electronic commerce are available because of the ease of payment over the Internet [4]. In general, there are two types of it: physical and virtual [3]. Physical card such as (American Express, Discover, MasterCard, Visa, etc…), which owner must show during the purchase process. While virtual card is online, it is used in free services offered by the original card issuer. Credit cards consider critical to improving the economic power of any country [1]. Using it by an ordinary citizen is a major step to promoting a highly desirable nonmonetary economy [2]. In recent decades and with the rapid development of the globalization of the economy, credit cards have become the most popular in business dealings. [5]. In everyday life, buying goods, services and trading online has increased [3]. These increases made payments more efficient, cancel cash handling fees, reduce the risk of theft and increase economic activities [1]. The use of a credit card often leads to adjustment of relationships between the power of money and distrust and anxiety [20]. Finally, the use of the credit card is necessary where the holder does not need to carry large amount of money with him and it provides the possibility of payment using any kind of currency, thus, does not require currency exchange by the customer.
Data mining is calculations to analyse large-scale data so as to extract useful patterns and computing@computingonline.net www.computingonline.net

Print ISSN 1727-6209 On-line ISSN 2312-5381
International Journal of Computing information [6]. A wide range of data is transforming into knowledge via data mining, which can be seen as a result of the natural development of information technology [7]. Data is a tool like any other tool and it is not important to know how it works but it is necessary to know how it will be used [18]. It works with two policies, supervised (with training set) and unsupervised (without training set) learning [10], the modelling objectives most commonly used in data mining are prediction and classification [11]. The success of data mining in many fields including electronic business, shopping and sales led to its use in other applications [9]. Nowadays, data mining has a major role in many applications like business administration, government sides, engineering and scientific fields, healthcare and educational foundations [8].
In data mining, the most common methods is finding rules for prediction from empirical data groups [14]. Prediction is the process of results prediction based on existing data, relying on the relationship between something you know and something we need to predict. In general, there are three kinds of prediction: classification, density estimation and regression [12]. Classification and regression can be distinguished according to the type of prediction, where the output of the classification is a categorical class while the model in the regression learns continuous values [13]. This type of data mining can help business leaders make better decisions, which may be beneficial by knowing many of the statistics required in the future. Prediction used in data mining to rationalize decisions in many areas, such as banks, build future predictions and explore behaviour and trends, allowing accurate judgment to be made and taken in a timely manner.
The remainder of the research is organizing as follows. The related works are presented in Section 2. The techniques used in prediction are presented in Section 3. The suggested methods are used in designing the new predicator in Section 4. Generating steps of the proposed predicator and challenges are shown in Section 5. The experiments are described in Section 6. In the end, Section 7 includes discussion and conclusion of the research.

LITERATURE REVIEW
A lot of previous works have used data mining methods to deal with credit card applications and problems. One of the most popular problem is fraud detection; it has taken a lot of effort due to the increasing number of frauds all over the world.
In 1999, Philip Chan and other researchers proposed a method of combining multiple learned fraud detectors under a "cost model". The outcomes show that we can fundamentally diminish misfortune because of misrepresentation through circulated information mining of extortion models [19].
In another time, Yap Bee Wah and Irma Ibrahim showed in their paper the development and correlation of three credit scoring models: logistic relapse (LR) model, arrangement and relapse tree (CART) model and neural system (NN) model to segregate amongst rejected and acknowledged charge card candidates of a bank [20].
In 2011, V. Bhusari and S. Patil showed how Hidden Markov Model (HMM) used to discover the fraud transaction of credit card with least false alarm. A HMM based framework is at first concentrated on spending profile of the cardholder and took after by checking an approaching exchange against spending conduct of the cardholder, on the off chance that it is not acknowledged by their proposed HMM with adequate likelihood, then it would be a fake exchange.
In the same year, Raghavendra Patidar and Lokesh Sharma tried to detect fraudulent transaction through the neural network along with the genetic algorithm. Genetic algorithm is used for setting the system topology option, shrouded layers numbers, hubs numbers as a part of the neural system configuration for the card payment misrepresentation identification. For the learning reason, to simulate neural system they utilized directed learning nourish forward back proliferation calculation. At long last they propose some future work that should be possible in making extortion recognition.
In 2013, Hetvi Modi proposed a system that is capable of detecting fraud by taking into account the spending routine of credit card holders without its importance. Ordinarily, when the points of interest of things obtained in individual exchanges are unknown this system is not a system of Fraud Detection. The proposed framework will be a perfect decision for tending to this issue of current extortion location framework. Another vital preferred standpoint of proposed framework will be an extraordinary decrease in the quantity of False Positives exchanges. FDS module of proposed framework will get the card points of interest and the estimation of procurement to confirm, whether the exchange is honest to goodness or not. In the event that the Fraud Detection System module will affirm the exchange to be of misrepresentation, it will raise an alert, and the exchange will be declined.
Marcos and Sousa and Reginaldo Figueiredo, 2014, developed models to analyse the capacity of a credit union's members to settle their commitments, using a decision tree-C4.5 algorithm and an artificial neural network-multilayer perceptron algorithm. It is concluded that for the problem of credit analysis, the models have statistically similar results and may aid in a cooperative's decisionmaking process.
Another research in 2014 used Hidden Markov Model (HMM), model the order of service while making a credit card transaction. If the qualified HMM does not accept an incoming credit card transaction with sufficiently high probability, this will be deemed fraudulent.
Jae Kwon Bae1 and Jinhwa Kim, 2015 build many of personal credit rating prediction models based on the UDM and benchmarks their performance against other models which employ logistic regression (LR), Bayesian style frequency matrix (BFM), multilayer perceptron (MLP), classification tree methods (C5.0), and neural network rule extraction (NR) algorithms. To verify the feasibility and effectiveness of UDM, personal credit data and personal loan data provided by a Financial Holding Company (FHC) were used in this study. Empirical results indicated that UDM outperforms other models such as LR, BFM, MLP, C5.0, and NR.
Mrs. Poonam and other researchers, 2016, give suggestions for a new fraud detection technique in credit cards which can be implemented and which will seize the essence of the existing techniques and may combine few of them to give superior fraud detection tool.

PREDICTION DATA MINING TECHNIQUE
In this part, the main principle used to identify and resolve the problem will be discussed.

TREENET ALGORITHM
One of the most revolutionary developments in data mining technologies is TreeNet, undertaken by one of the most famous researchers in data mining, Jerome Friedman. TreeNet is designed for predictive modelling with very high accuracy. TreeNet tries to achieve this aim even if the required models are too complex, so models are perhaps relatively hard to understand in detail. In general, TreeNet model is made up of small trees ranging from several dozen to several hundred, usually each of them does not exceed the size of two to eight terminal points. Spiritually, this model is identical to the long series expansion (such as a Fourier or Taylor's series) -A set of factors that gradually become more accurate with continued expansion [12]. The model can be considered as a serial expansion that approximates the real functional relationship.
( ) = 0 1 1( ) + 2 2( ) + ⋯ + ( ), (1) where each TM is a small tree. This section presents techniques for improving the classification accuracy by aggregating multiple classifier predictions. Such approaches are methods known as the ensemble or classifier combination. An ensemble approach generates a collection of base classifiers from training data and performs classification by taking a vote on each base classifier's predictions.
Generally, two substantial prerequisites must meet for an ensemble classifier to be with a performance higher than the performance of the individual classifier: (1) the independence of the base classifiers from each other, (2) the performance of the base classifiers must be better than the performance of the classifier that performs random estimation.
The figure below illustrates the ensemble method as a logical view. Idea is centred on taking the original data and creating several classifiers from it and with the classification of unknown cases, their predictions are translated. The ensemble classifiers can be created in several strategies, [22]. When manipulating the training set, numerous training groups are formed in this way by reconstituting the original data according to some sample distributions. From each training group, a classifier is constructed by using a specific learning algorithm. Two examples of ensemble methods are Bagging and boosting which are working on manipulating their training sets. This work is based on using boosting learning algorithm in prediction. We can explain the main classification methods as follows:  At the pre-processing stage select the important features from the total input features to each training set. These subsets of features are chosen randomly or based on the recommendation of domain experts.
 At the classification stage determine the label for each class. The classification method works with large number of classes.  By this procedure, we can apply each of the learning algorithms several times on the same training data and this may result in different models. For example, TreeNet can generate different models by changing its number of trees or number of levels in each model. The first three approaches are generic methods that are applicable to any classifiers, on another hand; the fourth approach depends on the type of classifier used. The base classifiers for most of these approaches can be generated sequentially (one after one) or in parallel (all at once). Algorithm bellow shows the steps needed to build an ensemble classifier in a sequential manner [23], [22].
Algorithm: General Procedure for Ensemble Method [18]

CORRELATION MEASURE
We can compute the correlation among features based on multimethods, one of those methods is called the Pearson correlation. In general, we used it to determine which of these features is more effective in taking decision or to determine the positive and negative association among features. This mechanism is based on three criteria:  If the correlation between features goes towards (+1) this indicates a positive relationship between the features.  If the correlation between the features goes towards (-1) this indicates an inverse relationship between the features.  Between any two features, if the correlation is (0), then there is no correlation between them.

CONFUSION MATRIX
The Confusion Matrix is a particular arrangement of a table that lets visualization of performance of a Tree Net. Below, a table illustrate the observations that represent the testing dataset. Several standard terms for the two confusion matrix classes were identified as follows [17] Accuracy (ACC): ratio of the overall number predictions that is correct is calculated as follows: The R is a measure of the number of true cases in confusion matrix, can be computed as follows: Finally, precision (PT) is the proportion of the predicted positive cases that are correct, can be computed as follows: The F_measure takes into account both accuracy and recall to supplying measurement for the system that is single: The F measure was derived in order to F ß measures the analytical performance

SUGGESTED SYSTEM
The proposed system used raw dataset to analyse credit card databases depending on Treenet technique. This system includes the following steps; (a) find the suitable database from the internet because this database is not available in Iraq, (b) pre-process that credit card database based on Pearson correlation matrix to determine which feature is less correlated with others to remove it and reduce the time of prediction, (c) split preprocessing database into two parts: training and testing dataset, (d) apply TreeNet prediction data mining techniques (TPDMT) on training dataset to test if we need payment next month or not, also, find the optimal tree. Treenet based on Boosting Machine uses Decision Trees (DTs) usually to generate the predictor. (e ) Finally, pass the testing dataset on the optimal tree results from TPDMT, then five experimental measures created by a confusion matrix would be assessed, namely: accuracy (AC), recall or true positive rate (TP), precision (P), Fmeasure (consideration of both precision and recall) and Fb Fig. 2 shows the structure of the proposed method.

IMPLEMENTATION
The proposed system is implemented on a raw dataset to receive the results.

Step 1: The dataset used [21]
Description of raw dataset  Records: 1000 from total 30000  Step 2: Compute the correlation among the features using (Pearson's correlation matrix) As a first step, we apply the correlation function from Pearson type to compute the relation among the features and remove the features that have less correlation with others, but in this work, the features have strong relation with others, therefore, we do not remove any feature. Table-2 shows the Pearson's correlation matrix of credit card database. From the above Table we found that all the features are important, therefore, we could remove any feature or consider it as less important than others.
Step 3: Split the dataset to training and testing stage In this step, we split the database into two parts, training and testing dataset as explained in Table 3. Step 4: Apply TreeNet Data Mining Algorithm as Classification Methods In this step, we use the TreeNet on the training dataset, the main purpose of this step is to determine the optimal tree based on multi evaluation measures; in general, the max number of trees uses 200 trees. Table 4 explains the result of evaluation of the tree based on multi measures. Step 6: Determining the best (Optimal) tree based on each measure Step 7: Determining the importance of each variables in best training model Step 8: Pass testing dataset to optimal Model to analyze the results based on confusion matrix measures and Predicted Target Class To obtain results, we passed testing dataset to optimal model to analyze the results based on confusion matrix measures and Predicted Target Class, we use the confusion matrix measures that are explained in section (3.3)  In general, there two variables for more contact and effect on the target in optimal predictor are (X20, X6).

CONCLUSION
When applying pre-processing in the first stage using Pearson correlation, it is found that the credit card features have strong correlation and we cannot Abs. Importance for each variables Abs. Importance delete any of them because of their strong correlation with the target Y, the target is predicting credit card payment next month. Using multiple decision trees representing TreeNet gives more confident results than single decision tree; and makes the optimal forward predictive model.
Although correlation has proved that all features are important and strongly correlated, but also the best decision tree achieved has proved that all features are important in decision making of data.
Using confusion matrix measures for predicting the payment possibility next month has proved its efficiency since we achieved a tree positive rate near to 0.97674, as it is shown in Table (7) and the prediction accuracy was 84.87%