ADAPTIVE HUMAN MACHINE INTERACTION APPROACH FOR FEATURE SELECTION-EXTRACTION TASK IN MEDICAL DATA MINING

Feature Selection task is one of the most complicated and actual in the areas of Data Mining and Human Machine Interaction. Many approaches to its solving are based on non-mathematical and presentative hypothesis. New approach to evaluation of medical features information quantity, based on optimized combination of feature selection and feature extraction methods is proposed. This approach allows us to produce optimal reduced number of features with linguistic interpreting of each of them. Hybrid system of feature selection/extraction based on Neural Network-Physician interaction is investigated. This system is numerically simple, can produce feature selection/extraction with any number of factors in online mode using neural network-physician interaction based on Oja’s neurons for online principal component analysis and calculating distance between first principal component and all input features. A series of experiments confirms efficiency of proposed approaches in Medical Data Mining area and allows physicians to have the most informative features without losing their linguistic interpreting.


INTRODUCTION
The medical diagnostics is the one of the areas where human machine interaction problems [1][2][3] became really important due to different reasons. Physician is a person who decides what diagnosis should be made to patient and only physician is held fully liable for it. But most of measurements in human organism are made using technical systems. Some of these systems contain sub-systems of decision making and others don't have them. A number of measured parameters in patients is often too large. It is easy to see, that a physician obtains diagnostic information from different sub-systems and must make diagnostics when input information contains a large number of features. To make diagnostics in a good and honest manner a physician needs to obtain information about the most informative features before making any diagnostics. In the area of Medical Data Mining this problem is known as feature selection.
Actually, feature extraction and feature selection approaches in medical diagnostics are the most discussable and complicated problems. Medical data sets often contain a large number of features, which coincide with a little number of patients. This fact makes ineffective the most known approaches for correct medical diagnostics without compressing input features space.
Preprocessing of input data is one of necessary stages of whole Data Mining task. During preprocessing the tasks of data reduction, feature selection and feature extraction have to be realized [4][5][6][7][8][9]. Feature selection and feature extraction approaches are the most important stages as possibility and quality of medical diagnostics process essentially depends on choosing specific features and their quantity.
At present time clear mathematical grounding was obtained due to feature extraction approaches such as principal component analysis (PCA) [10][11][12], discriminant analysis, principal manifolds analysis (PMA) [13]. Neural network approach can be used for solving feature extraction problem that is an actual task in deep learning systems (autoencoder Bottle Neck, Restricted Boltzmann Machine, e.a.) [14][15]. In medical diagnostics process it is very important to have possibility to interpret obtained results using input features. That's why feature selection approaches became essentially useful in Medical Data Mining tasks, but usually they are based on non-formal intuitive hypotheses [16]. So, in this paper the authors propose to formalize and optimize this process using combined feature selection-extraction procedure in online mode.
From physician's point of view, feature selection approach is required for choosing reduced number R n of features from n features (where R n n  ) with minimal loss of information value to provide mathematically grounded diagnostic process.
All theses described earlier need to be presented from mathematical point of view to create mathematical system for feature selection, based on original dataset. So, feature selection approach is a process of choosing a map Therefore, the principal goal of such transformation is reducing of input feature space with saving of optimal data specifications to make medical diagnostic process clear and correct. Applying of hybrid systems of Computational Intelligence for different tasks in Medical Data Mining area is an actual problem, which can increase quality of medical diagnostics, clusterization, pattern recognition and so on. This situation occurs due to the ability of hybridization process to integrate advantages of different systems for achieving a goal. Applying of hybrid system for searching the most informative features in different medical applications is an actual task, especially in medical diagnostics to exclude any situations of information shortage (data incompleteness, data fuzziness, presence of gaps and outliers in data).
In this paper authors propose to integrate advantages of feature extraction and feature selection systems and create a united hybrid feature extraction-selection system for estimation of features information content with extraction of the most informative features without losing its physical sense (possibility of linguistic interpretation).

HYBRID SYSTEM FOR FEATURE SELECTION-EXTRACTION BASED ON NEURAL NETWORK-PHYSICIAN INTERACTION
New approach based on using neural network with Oja's neurons integrated with physician is proposed. In the frame of this approach the most informative feature is the one, which has minimal distance with output signal of Oja's neuron in sense of Manhattan metrics. It's important to mark that proposed system can implement feature extractionselection process in online mode.
At the first step of the proposed approach all measured features of a patient are fed to processing sequentially. If any of the features contains gaps, they will have to be filled by any procedure, described, for example, in [17].
Previously, input data have to be centered and normalized using expression Encoded feature vector is fed to processing on group of Oja's neurons, where in output of the first Oja's neuron the first principal component   1 y k is obtained. Oja's neuron was proposed by Erkki Oja, professor from Aalto University and helps to implement extraction of the first principal component from features in dataset in online mode.
Self-learning algorithm for adaptive linear associator (ALA) based on normalized Hebb rule for computing of the first principal component, that corresponds to eigen vector 1 w of Oja's neuron [18,19] This algorithm minimizes local (one-step) learning criterion (energy function, Lyapunov function) Considering the fact that gradient of criterion (5) looks like and Oja's rule for the first eigen vector has the following form: we can write the learning procedure as follows: where   k  -learning rate parameter, that corresponds to Dvoretsky conditions. Expressions (6-7) are the procedures of stochastic approximation type with low speed of converging that cannot be used in non-stationary conditions. That is why, we can introduce the modified Oja's rule in the following form: It has the better operation speed and additional smoothing properties, which can be controlled using forgetting parameter . In Fig.1 the modified Oja's neuron architecture is presented. This architecture solves a task of principal component analysis with sequentially obtaining of principal components in online mode as opposed to classical PCA that provides obtaining of principal components in batch mode. It is simple to show that on fix number of features classical PCA and modified Oja's neuron architecture provides an obtaining of equal eigen vectors. For example, after testing of proposed system on Iris Fisher dataset [20] equal eigen vectors were obtained using classical PCA and modified Oja's neuron architecture. Time for calculating eigen vector of Iris Fisher dataset by classical PCA using MacBook 12' with Intel Core m3 processor (1,1 GHz), 8Mb of RAM (1867 MHz) was 0,93 sec. Time for calculating eigen vector of Iris Fisher dataset by modified Oja's neuron architecture in the same condition was provided by 1105 iterations (2,57 sec). It took more processing time to modify Oja's neuron but only using of this system can provide paralleling work of Physician (Human) and Machine (Neural Network) to make Human-Machine (Neural Network-Physician) Interaction come true. All patients come to physician sequentially, physician processes features sequentially, so computer system needs to be working in online mode to provide online data processing.
At the next step, distances in the sense of Manhattan metrics between all features vectors and output signal of first Oja's neuron are calculated. A feature that has a minimal distance is chosen like the most informative one. Then, this feature-winner is excluded from original data massive and system continues to process reduced dataset until all features will be turned over.  (8). In block of feature-winner detection, the process of defining a feature that has minimal distance in sense of Manhattan metrics with   1 y x  is implemented using (9). At the final stage in reduction block this feature-winner is excluded from original dataset and the system continues to look for the next feature with minimal distance (next most informative feature).
As a result of feature selection-extraction hybrid systems work, reduced dataset

EXPERIMENT
Hybrid system of feature selection-extraction based on Neural Network-Physician interaction was used for finding the most informative features in medical datasets from UCI Repository: dermatology.data (contains 34 features) [21], breastcancer.data (contains 9 features) [22], pima-indiandiabetes.data (contains 8 features) [23], parkinsons.data (contains 21 features) [24]. All of these datasets contain classes-diagnoses that are overlapped in feature space (Fig.3-Fig.6) where classes-diagnoses were marked by different dots. In all figures dataset information was presented on the space of three first principal components. This approach is applied only to dataset visualization because all axes have no physical interpretation. Figures (see Fig. 3-Fig. 6) represent the most informative dividing of input information. In each dataset procedure of feature selectionextraction was made. Final results are presented in Table 1.
In dermatology.data the most informative features are: elongation of the rete ridges, exocytosis, inflammatory monoluclear infiltrate, clubbing of the rete ridges, saw-tooth appearance of retes, scaling etc. In breast-cancer.data the most informative features are: Bare Nuclei, Uniformity Cell Size, Uniformity Cell Shape, Clump Thickness, etc.

Figure 4 -Visualisation of Breast Cancer dataset
In pima-indians-diabetes.data the most informative features are: BMI, Diastolic, Glucose, NbPregnant, Age, etc. In parkinsons.data the most informative features are: Signal fractal scaling exponent, nonlinear dynamical complexity measure (RPDE), nonlinear measure of fundamental frequency variation (spread1), ratio of noise to tonal components in the voice (HNR), variation in amplitude (Shimmer_DDA).  x k x k x k R   can be processed by systems for medical fuzzy diagnostics, for example presented in [25][26][27][28].

CONCLUSION
In this paper a hybrid system of feature selectionextraction based on Neural Network-Physician interaction is proposed. This system enables to extract the most informative features without losing the physical sense of reduced feature space in online-mode, and it can simplify the Human Machine Interaction in the area of Medical Data Mining.