Breast Tumors Diagnosis Using Fuzzy Inference System and Fuzzy C-Means Clustering

Many of the researches have been successful in the field of computer-aided diagnosis because of the important results the intelligent computing approaches have achieved in this field. In this paper the robust classification method is presented, that attempts to classify the tissue suspicion region as normal or not normal by using a Fuzzy Inference System (FIS) using the Fuzzy C-Mean (FCM) clustering for fuzzification of the GrayLevel Co-Occurrence Matrix (GLCM) feature and a match shape function for fuzzification of matrix shape, then by using (T-norm) generate 729 rules (243 rules based on normal DB case, 243 rules based on benign case, 243 rules based on malignant case), after that the best Eighteen rules are selected (best 6 rules based on normal DB case, best 6 rules based on benign DB case, best 6 rules based on malignant DB case) by using genetic algorithm, then make summation for each group if the summation of 6 rules based on normal DB is greater than other summation of two group (best 6 rules based on benign DB case and best 6 rules based on malignant DB case) that mean resulted of the classification step is normal. The model approved efficiency classification rate of 97.5% of input dataset image.


I. INTRODUCTION
UE to the increase in the amount and development of digital medical images produced recently, these images have required efforts of researchers to propose a computational technique that provides effective support to daily tasks related to image-based diagnosis [1][2][3]. The breast cancer is a second common cancer type widespread disease around the world, in 2018, it was estimated that 17,586 women and 144 men were diagnosed with breast cancer, on average, 48 people can be diagnosed with breast cancer every day.
In the early phase tumors are observed as tiny bright spots by mammography; these spots are calcium sediments called Micro Calcifications (MCs). In most cases they are not clear in the images and it is difficult to recognize them since the radiologists are faced with a challenge due to the nature of the human vision system.
The second kind of breast cancer, called masses, can be simpler to detect because of their size, shape and color contrast, but some types of masses can be difficult to detect because of the nature of the tissue, which may appear similar to normal breast tissue (parenchyma) [4].
Breasts are in the upper ventral part of a person on both sides of the body and includes every part of the frontal area of the human body from the beginning of the second to the sixth rib, which includes mammary gland. After birth of the child, often, glands produce milk with stimulation. These glands are found in both males in primitive form only and female, with some exceptions [5]. D Mammography plays a very critical role in breast cancer detection at its early stages due to the changes which can be shown in the breast. These changes may be occurring in the breast up to two years before a person suffering from or physician can feel them. A mammogram is an X-ray image of the breast. A screening mammogram is used to find early signs of breast cancer. It is the best screening tool used today to find breast cancer [6].
Pattern Recognitions is an important part of artificial intelligence which attempts to make machines as intelligent as the human. The classification can be utilized to predict the prosperity in samples that are not part of the original training set. There are two general kinds of classification mechanism: supervised and unsupervised classification. Supervised classification is using pixels of known classes to identify pixels of unknown classes. An unsupervised classification which also called as data clustering is defined as the problem of classifying a group of objects into a set of natural clusters without any earlier knowledge. Many clustering methods can include such as hierarchical clustering, K-means cluster and Fuzzy C-Mean (FCM) cluster [7].
The research literature is presented by many papers on using different applications of Computer Aided Detection and Diagnosis (CADs) in different medical cases. In [8], 2018, it was proposed a novel breast cancer intelligent diagnosis approach. Four machine learning algorithms, namely, support vector machine, logistic regression, Knearest neighbor and Bays classification, were applied to construct a predictive model. They achieved accuracy of 0.772.
In [9], 2018, it was proposed to employ a Shallow-Deep Convolutional Neural Network (SD-CNN) to extract novel features from mammogram image to classify the cases as benign vs. malignant, whereas SD-CNN improves the diagnostic accuracy to 0.90.
In [10], 2018, a novel breast cancer intelligent diagnosis approach was offered, which employed information gain directed simulated annealing genetic algorithm to obtain the maximum classification accuracy and minimum misclassification cost. This approach is tested on Wisconsin Diagnostic Breast Cancer (WDBC) breast cancer data sets.
In [11], 2019, the CNNI-BCC (Convolutional Neural Network Improvement for Breast Cancer Classification) was designed to implement supervised deep learning neural network for breast cancer classification. The presented work is an attempt to help medical doctors in determining breast cancer lesion. The study was experimentally conducted involving 221 real patient subjects. The evaluation is based on accuracy of 90.71%.
In [12], 2019, they proposed a MIL benchmark showing that the recently proposed nonparametric MIL (Multiple instance learning) and MILCNN are particularly efficient for the tasks of patient and image classification. Patient classification rates can reach up to 92.1% for the 40 × magnification factor, a level never reached by conventional classification frameworks, which enhances the fact that instances are complementary and can be fruitfully considered in a MIL framework. MIL can thus leverage digital histopathological image classification and analysis to improve computer-aided diagnosis, without the need to label all the images.
In this paper, the fuzzy inference system concept is proposed to improve diagnosis of breast tumor. The input features of the system are input of fuzzy representation Gray-Level Co-Occurrence Matrix (GLCM) feature (Contrast, Correlation, Energy, Entropy and Homogeneity), and representation of a binary shape. The first step is fuzzification. It is achieved by applying fuzzy mean cluster to cluster the feature in similar groups. The second step is generation of fuzzy rule by applying T.norm operator, and aggregation by using Zadeh implication and genetic algorithm for selecting the best rule, the last step is defuzzifation which is utilized for the diagnosing the breast tumor disease in the knowledge-based system of fuzzy rulebased reasoning method with obtained 97% of accuracy.
The presented work is organized as follows. The Background Theories is presented in section 2. In Section 3, the proposed method and its variants are explained in detail. In section 4, the case study is presented, where results from the proposed method and other methods are analyzed and discussed. The results of the research are summarized and concluded in section 5.

A. FUZZY INFERENCE SYSTEMS (FIS)
FIS is building on sets of rules (if-then) where the input variables are connected to output by these rules. FIS is composed of five components: first, fuzzification of the input, then, rule generation, after that, decision-making unit, last, defuzzification as shown in Fig. 1. The fuzzy inference system consists of three steps: fuzzification, rule generation, and defuzzzufication. The fuzzifcation means that the input values get the membership values with each linguistic label. The rule generation means that the membership values on the premise part get firing strength (weight) of each rule, the defuzzification means converting the fuzzy to crisp for each rule depending on the firing strength [13].

B. THE FUZZY C-MEANS CLUSTERING
FCM is a clustering algorithm developed by Dunn. The FCM specifies n × c matrix U = {u ij }, where c and n refer to the number of center clustering and the samples in the data, and u ij denotes the membership value of the j-th sample towards thei-th clustering center. Thus, FCM algorithm is well suitable to divide feature of mammogram image such farmlands into known number of clusters [14][15][16] as algorithm (1) shows.
Algorithm (1):-Fuzzy C-Means algorithm )FCM( The Conventional Fuzzy C-Means Algorithm End For 9 End For 10 End For

C. GENETIC ALGORITHM (GA)
Genetic algorithm is a heuristic method that mimics the operation of natural selection. The genetic algorithm passes a number of steps as follows: an initial population of individuals is generated at random or heuristically, each individual is evaluated according to a fitness function that describes the optimization problem in the search space where each individual is selected according to its fitness. Then crossover and mutation come. The crossover operation means that the two parents (individuals) exchange parts of their genomes to the proudest new offspring (individual). The Mutation operator is performed by flipping bits at random, with some small probability for convergence to local optima [17][18][19].

III. SYSTEM MODELING AND MATERIAL
The proposed methodology is described in the following sections.

A. DATA SET
The mammographic image analysis society dataset (MIAS) is used for testing proposed system. The dataset contains 322 mammogram images (207 normal, 63 benign, and 52 malignant) for 161 patient, where the size of each image is 1024 pixels × 1024 pixels and format PMG [20].

B. FEATURE EXTRACTION
The main goal of feature extraction of suspicious region is to select the subset of relevant features that result in the maximizing of classification accuracy classification mod. For this work, it was found in a selection of a number of research papers based on feature analysis that it is recommended that the following GLCM features are used: correlation, contrast, entropy, energy, and homogeneity. The shapes of the tumors detected are represented as binary matrices. A match function is used to differentiate between normal shapes and cancer shapes as it is shown in the blocks in algorithm (2).
Algorithm (2)  Where p represent the probability of pixel

C. FUZZY SYSTEM MODEL CLASSIFICATION
Texture analysis on the basis of statistical and GLCM features is used here for classification. Classification is a difficult task as it involves decisions which are intended to determine whether a mass is normal and cancerous. In this work, a fuzzy system hybrid with fuzzy mean clustering along with a genetic algorithm is used to match shapes.
The fuzzy model consists of knowledgebase, input, fuzzification, rule generation, aggregation, and defuzzification. The algorithm (3) below shows the steps of classification operation:
Output: case is normal, benign, malignant Step 1: prepare knowledge base by build three database, normal database case, benign database, malignant database case.
Step2: fuzzifier the binary shape by using by apply match shape function based on the similarity function A(j) = [ ∑( I U Bi) / ∑ (I ∩ Bi) ] and the compute the mean of A and store Shape_fuzzy_set 1 Step3: fuzzifier the GLCM feature by applying fuzzy mean cluster on Contrast, Correlation, Stander Deviation, Entropy Homogeneity and creating the fuzzy set (Contrast_fuzzy_set con *Correlation_fuzzy_set cor *Energy_fuzzy_set enr * Entropy_fuzzy_set ent *Homogeneity ℎ ) Step 4: apply Rule T-norm on the membership of each parameter of fuzzy set(from step2, 3) and create729 rules step 5 : then apply the genetic algorithm to select the best six rule from each group Step 6: Defuzzification achieving by summation of six rules and comparison output-normal = sum (first six rule based on normal knowledge base). output _benign = sum (second six rule based on benign knowledge base). output _malignant = sum( third six rule based on malignant knowledge base).
if (output _norma> output _benign) And (output _normal> output _malignant) Then the case normal else if output _benign> output _malignant case is benign else malignant end if Moreover, the block diagram illustrates the classification process as shown in Fig. 2.
The knowledge base of the proposed system has three databases (normal database case, benign database case, malignant database case). The inputs of the proposed system are GLCM feature (Contrast, Correlation, Stander Deviation, Entropy and Homogeneity), and binary shape.
The first step of the FIS model is fuzzification. In this step, the degree to which each input value belongs to the fuzzy set is determined. Fuzzification uses fuzzy c-mean cluster. The fuzzy mean cluster assigns each input of GLCM feature to three groups (low, medium, high) based on normal database, then generates fuzzy set based benign database case, and malignant based database case.
The shape fuzzification is done by applying on shape matching function, the following process is undertaken by measuring the similarity between a suspicious shape and a set of training data, a similarity measure which determines the ratio between an overlap-area and a union-area can be written as in equation (6); where a number of round training-shapes is given, 'B' is a round training-shape, and 'A' is a suspicious shape. The operation of 'A∪B' is obtained by using the Boolean 'OR' logical operation between the suspicious shape and the round training shape and the result of 'A∩B' is obtained by using the Boolean 'And' logic operation between them. The equation, in which the membership of the round-training shape set obtained, is represented by equation (7):

Figure 2. Block diagram for step classification
Then fuzzification of the irregular training shape using the same equation (7) takes place, the second membership is obtained from the regular training shape set, and this is represented by equation (8): Irregular fuzzy set= { μ c 1 , μ c 2 , … , μ c n } . (8) The next step obtains the membership of the entire shape from the regular shape and the irregular shape equations (9), (10): μ A ∈Regular = mean { μ b 1 , μ b 2 , … , μ b n }. (9) μ A ∈Irregular = mean { μ c 1 , μ c 2 , … , μ c n }. (10) Then fuzzy inference is made. The determination of whether the shape A belongs to a regular class or an irregular class is performed via the rule.
The second step of the system is the fuzzy-rule. It matches a value using a first-order Sugeno fuzzy-rule. A rule receives value only from the fuzzifications that are involved in the antecedents of the fuzzy rule that were explained above, and computes the truth value of the rule. In an FIS, the 'product' operator is applied to evaluate the conjunction of the antecedents. The inputs are the degree of membershipfunctions which are multiplied through a T-norm operator ⊗ which determines the degree of w p of the rule, will generate 243 rule based on normal case database, 243 rule benign database case, and 243 rule malignant based database case. w p =Shape_fuzzy_set 1 *Contrast_fuzzy_set con * Correlation_fuzzy_set cor *Energy_fuzzy_set enr * Entropy_fuzzy_set ent *Homogeneity fuzzy set h (11) p=1,…., 243, con, cor, enr, ent, h=1,2,3 The third step is an aggregation rule. Each of the 243 rules is aggregated to six rules by applying a genetic algorithm. To apply a genetic algorithm, the initial population results in a T-norm operation, the fitness of chromosome is the same as the value of the chromosome, the selection operation is based on roulette wheel selection, the combination crossover calculated by using Zadeh implication is given in equation (12) below: μ A→B (x, y) = max[min{min(μ A (x), μ B (y)} , 1 − μ A (x)]] (12) ∀xϵX, ∀yϵ Y The first generated population is half of the initial population; this operation is repeated until the best five rules are selected (note if less than five chromosomes are returned, repeat the selection operation in order to obtain this number of chromosomes).
Fourth step is the defuzzification implemented by combining first six rules, then second six rules and third six rules and then comparing them. If the first value is the largest than the rest values, that means the normal case, and so the rest.

IV. RESULTS
In relation to the structure of FIS, as it is shown in section 3, the inputs of a FIS classifier are the features.

A. FEATURE EXTRACTION
The proposed processes, as mentioned in the previous chapter, were implemented and applied to real breast cancer mammography images chosen from MIAS dataset. The GLCM features value as saved in the database is presented in Table (1), Table (2) and Table (3). The results indicate that normal, benign and malignant tumors are not linearly separable, and texture features (GLCM) give a high level of classification accuracy with mammogram image.
For this work, six features were selected (shape, contrast, correlation, standardization, entropy, homogeneity). From the results that were observed, a cancerous shape will generally be more irregular and shapes with regular and smooth boundaries are benign. The correlation was always greater value that the found mass was correctly categorized, and so the correlation with cancer was higher than it is for the normal Region of Interest (ROI), and it is observed that the entropy, homogeneity, standardization and contrast values tend to be high for cancer region as compared to a normal region.

B. CLASSIFICATION
In relation to the structure of FIS, as it is described in section 3.2.1 the inputs of a FIS classifier are the features (shape plus the GLCM features). The first step is the fuzzification of each input by using fuzzy mean clustering; examples of fuzzy sets are given in Figures (3), (4), (5), (6) and (7).   The third step was aggregation by applying a genetic algorithm and zadeh implications shown in Figure (9).  In conclusion, one individual FIS classifier has 6 input nodes, 45fuzzy sets, and 729rules. Then a genetic algorithm is used to select the best eighteen rules, and then defuzzification results in one output. Some examples from the testing of the system are shown in Figure (

C. TESTING THE FUZZY INFERENCES SYSTEM CLASSIFIER
When the constituent FIS of the ensemble model completes its classification on the corresponding test subset, as was discussed in the previous section, the final results of classification are determined. Some standard performance metrics are then used: Accuracy, Misclassification Rate for the dataset. Table (4) shows the test of the proposed model. Accuracy= (TP+TN)/total = 0.97515528 Misclassification Rate = (FP+FN)/total = 0.02484472 The suggested system results have been compared with the results of five previous related works and show that our system is more accurate than those in the compared works, as shown in Table (5). Table 5. Comparison of this paper results with some related work results Authors Accuracy The proposed system 97.5% Mao, N., et al [8].

V. CONCLUSIONS
In this paper, an attempt to develop an efficient tissue classification system based on intelligent computing model was made. This section discusses the overall work carried out in this paper and presents the main conclusions as follows: choosing an effective artificial model, conducting experiments and observing the results of the used algorithms play an important role in increasing the efficiency of the system, so we find a clear disparity in the accuracy of the systems that had been previously described. The intelligent computing model utilized for classification ROI (Region of Interest) as normal, benign, or malignant tissue by fuzzy inferences system (fuzzy mean clustering for fuzzification of the GLCM feature, match function for fuzzification of the matrix shape, with (T-norm) to generate 729 rules and then select best eighteen rules by using genetic algorithm) approved its efficiency in many tests and achieved a sensitivity of %97.5. GLCM texture analysis was extracted then just 5 of them were selected depending on the recommendations of the previous researchers.