Open Access Open Access  Restricted Access Subscription Access


Balasaheb Tarle, Muddana Akkalaksmi


In medical data classification, if the size of data sets is small and if it contains multiple missing attribute values, in such cases improving classification performance is an important issue. The foremost objective of machine learning research is to improve the classification performance of the classifiers. The number of training instances provided for training must be sufficient in size. In the proposed algorithm, we substitute missing attribute values with attribute available domain values and generate additional training tuples that are in addition to original training tuples. These additional, plus original training samples provide sufficient data samples for learning. The neuro-fuzzy classifier trained on this dataset. The classification performance on test data for the neuro-fuzzy classifier is obtained using the k-fold cross-validation method. The proposed method attains around 2.8% and 3.61% improvement in classification accuracy for this classifier.


Classifier; Imputation; Neuro-fuzzy Classifier; Training tuples; Missing data.

Full Text:



P. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, Edition 2012.

J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann Publishers Inc. San Francisco, USA, 2011.

B. Tarle, R. Tajanpure, S. Jena, “Medical data classification using different optimization techniques: A survey,” International Journal of Research in Engineering and Technology (IJRET), vol. 5, Special Issue 5, ICIAC 2016, pp. 101-108, May 2016.

D.V. Patil, R.S. Bichkar, “Improving generalization ability of classifier with multiple imputation techniques,” ICIP 2012, Communications in Computer and Information Science, vol. 292, Springer, Berlin, Heidelberg, pp. 309-317, 2012.

R. W. Krause, M. Huisman, C. Steglich and T. A. Sniiders, “Missing network data a comparison of different imputation methods,” Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, 2018, pp. 159-163.

A. M. Kalteh and P. Hjorth, “Imputation of missing values

in the precipitation-run process database,” Journal of Hydrology Research, vol. 40, issue 4, pp. 420-432, 2009.

Jaemun Sim, Jonathan Sangyun Lee, and Ohbyung Kwon, “Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications,” Mathematical Problems in Engineering, vol. 2015, pp. 1-14, 2015.

P. V. de Campos Souza, L. C. B. Torres, A. J. Guimaraes, V. S. Araujo, V. J. S. Araujo, and T. S. Rezende, “Self-organized direction aware for regularized fuzzy neural networks” Evolving Systems, pp. 1–15, 2019.

C. de Bodt, D. Mulders, M. Verleysen and J. A. Lee, “Nonlinear dimensionality reduction with missing data using parametric multiple imputations,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 4, pp. 1166-1179, 2019.

H. Kang, “The prevention and handling of the missing data,” Korean Journal of Anaesthesiology, vol. 64, issue 5, pp. 402-406, 2013.

M.R. Mosavi, A. Ayatollahi and S. Afrakhteh, “An efficient method for classifying motor imagery using CPSO-trained ANFIS prediction,” Evolving Systems, pp. 1-18, 2019.

R. K. Nowicki, “On classification with missing data using rough–neuro-fuzzy systems,” Int. J. Appl. Math. Computer Science, vol. 20, no. 1, pp. 55–67, 2010.

S. Faisal and G. Tutz, “Nearest neighbor imputation for categorical data by weighting of attributes,” arXiv: 1710.01011v1 [stat.ME] 3 Oct 2017.

M. Albayrak, K. Turhan and B. Kurt, “A missing data imputation approach using clustering and maximum likelihood estimation,” Proceedings of the Medical Technologies National Congress, Trabzon, 2017, pp. 1-4.

X. Ma, Y. Jin, and Q. Dong, “A generalized dynamic fuzzy neural network based on singular spectrum analysis optimized by brain storm optimization for short-term wind speed forecasting,” Applied Soft Computing, vol. 54, pp. 296–312, 2017.

O. Akande, F. Li & J. Reiter, “An empirical comparison of multiple imputation methods for categorical data,” The American Statistician, vol. 71, no. 2, pp. 162-170, 2017.

B. Tarle, Ch. Sanjay, S. Jena, “Integrating multiple methods to enhance medical data classification,” Journal Evolving Systems, Publisher Springer Berlin Heidelberg, pp. 1-10, 2019.

Ezzine and L. Benhlima, “A study of handling missing data methods for big data,” Proceedings of the IEEE 5th International Congress on Information Science and Technology CIST, Marrakech, 2018, pp. 498-501.

S. P. Susanti and F. N. Azizah, “Imputation of missing value using dynamic Bayesian network for multivariate time series data,” Proceedings of the International Conference on Data and Software Engineering, 2017, pp. 1-5.

N. Anindita, H. A. Nugroho and T. B. Adji, “A combination of multiple imputations and principal component analysis to handle missing value with the arbitrary pattern”, Proceedings of the 7th International Annual Engineering Seminar (AES), Yogyakarta, 2017, pp. 1-5.

S. Azim and S. Aggarwal, “Using fuzzy c means and multi-layer perceptron for data imputation: Simple v/s complex dataset,” Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, 2016, pp. 197-202.

Q. H. Do and J.-F. Chen, “A neuro-fuzzy approach in the classification of students academic performance,” Computational Intelligence and Neuroscience, vol. 2013, pp. 1-7, 2013.

M. Juhola, H. Joutsijoki, H. Aalto, and T. P. Hirvonen, “On classification in the case of a medical data set with a complicated distribution,” Elsevier Applied Computing and Informatics, vol. 10, no. 2, pp. 52-67, 2014.

M. B. Gorzałczany, and F. Rudziński, “Interpretable and accurate medical data classification – a multi-objective genetic-fuzzy optimization approach,” Expert Systems with Applications, pp. 1-17, 2016.

Lin, J., Li, N., Alam, M.A. and Yuqing Ma l,. “Data-driven missing data imputation in cluster monitoring system based on deep neural network”. Applied Intelligence, pp,1-18,2019. doi:10.1007/s10489-019-01560-y

D. Dua, and C. Graff, UCI Machine Learning Repository, Irvine, the University of California, 2019. [Online]. Available at


  • There are currently no refbacks.
hgs yükleme