OPTIMIZATION OF ASSOCIATION RULES FOR TUBERCULOSIS USING GENETIC ALGORITHM

Authors

  • T. Asha
  • S. Natarajan
  • K.N.B. Murthy

DOI:

https://doi.org/10.47839/ijc.12.2.596

Keywords:

Tuberculosis, Data Mining, Diagnosis, Association Rules, Optimization, Genetic Algorithm.

Abstract

Tuberculosis (TB) is a disease caused by bacteria called Mycobacterium Tuberculosis which usually spreads through the air and attacks low immune bodies. Human Immuno deficiency Virus (HIV) patients are more likely to be attacked by TB. It is an important health problem around the world including India. Association Rule Mining is the process of discovering interesting and unexpected rules from large sets of data. This approach results in huge quantity of rules where some are interesting and others are repetitive. It also limits the quality of rules to only two measures support and confidence. In this paper we try to optimize the rules generated by Association Rule Mining for Tuberculosis using Genetic Algorithm. Our approach is to extract only a small set of high quality Tuberculosis rules among the larger set using Genetic Algorithm. In the current approach datatypes such as discrete, continuous and categorical items have been handled. The proposed experimental result includes a small set of converged TB rules that helps doctors in their diagnosis decisions. The main motivation for using Genetic Algorithms in the discovery of high-level prediction rules is that they are robust, use adaptive search techniques that perform a global search on the solution space and cope better with attribute interaction than the greedy rule induction algorithms often used in data mining.

References

HIV Sentinel Surveillance and HIV Estimation, New Delhi, India, National AIDS Control Organization, Ministry of Health and Family Welfare, Government of India, 2006, http://www.nacoonline.org/Quick_Links/HIV_Data/ Accessed 06 February, 2008.

Rakesh Agrawal, Tomasz Imielinski, and Arun Swamy, Mining association rules between sets of items in large databases, Proc. ACM SIGMOD International conference on management of data, 22 (2), 1993, pp. 207-216.

Rakesh Agrawal and Ramakrishnan Srikant, Fast algorithms for mining association rules in large databases, Proc. VLDB conference, September 12-15, 1994, pp. 487-499.

A. Savasere, E. Omiecinski, S. Navathe, An efficient algorithm for mining association rules in large database, Proc. of the 21st VLDB Conference, 1995, pp. 432–444.

H. Toivonen, Sampling large databases for association rules, Proc. of the 22nd VLDB Conference, 1996, pp. 134–145.

S. Birn, R. Motwani, J.D. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data, Proc. of the ACM SIGMOD, 1997, pp. 255–264.

D.I. Lin, Z.M. Kedem, Pincer search: a new algorithm for discovering the maximum frequent set, Proc. of the 6th International Conference on Extending Database Technology: Advances in Database Technology, 1998, pp. 105–119.

D.L. Yang, C.T. Pan, Y.C. Chung, An efficient hash-based method for discovering the maximal frequent set, Proc. of the 25th Annual International Conference on Computer Software and Applications, 2001, pp. 516–551.

R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimization to association rule mining, Applied Soft Computing, Elsevier, Vol. 11, 2011, pp. 326-336.

M. Saggar, A.K. Agrawal, A. Lad, Optimization of association rule mining using improved genetic algorithms, Proc. of the IEEE International Conference on Systems Man and Cybernetics, Vol. 4, 2004, pp. 3725–3729.

Halina Kwasnicka and Kajetan Switalski, Discovery of association rules from medical data – classical and evolutionary approaches, Proc. of XXI Autumn Meeting of Polish Information Processing Society, 2005, pp. 163–177.

A. Ghosh, B. Nath, Multi-objective rule mining using genetic algorithms, Information Sciences, (163) 1-3 (2004), pp. 123-133.

Sufal Das & Banani Saha, Data quality mining using genetic algorithm, International Journal of Computer Science and Security, (IJCSS), (3) 2(2009), pp. 105-112.

Peter P. Wakabi-Waiswa, Venansius Baryamureeba, Extraction of interesting association rules using genetic algorithms, International Journal of Computing and ICT Research, (2) 1 (2008), pp. 26-32.

Orhan Er, Feyzullah Temurtas and A.C. Tantrikulu, Tuberculosis disease diagnosis using artificial neural networks, Journal of Medical Systems, Springer, 2008, DOI 10.1007/s10916-008-9241-x online.

M. Sebban, I. Mokrousov, N. Rastogi and C. Sola, A data-mining approach to spacer oligo nucleotide typing of mycobacterium tuberculosis, Bioinformatics, Oxford University Press, (18) 2 (2002), pp. 235-243.

Rethabile Khutlang, Sriram Krishnan, Ronald Dendere, Andrew Whitelaw, Konstantinos Veropoulos, Genevieve Learmonth, and Tania S. Douglas, Classification of mycobacterium tuberculosis in images of ZN-stained sputum smears, IEEE Transactions on Information Technology in Biomedicine, (14) 4 (2010), pp. 949-957.

G.E. Goldberg, Genetic Algorithms in Search Optimization and Machine Learning, Addison Wesley, New York, 1989.

A.A. Freitas, A Survey of evolutionary algorithms for data mining and knowledge discovery, Advances in Evolutionary Computing: Theory and Applications, 2003, pp. 819–845.

Jesus Alcala-Fdez, Rafael Alcala, Mario Jose Gacto and Francisco Herrera, Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms, Fuzzy Sets and Systems, (160) 7 (2009), pp. 905-921.

Cristiano Pitangui, Gerson Zaverucha, Genetic based machine learning: merging Pittsburgh and Michigan, an implicit feature selection mechanism and a new crossover operator, Proceedings of the Sixth International Conference on Hybrid Intelligent Systems (HIS’06), 2006.

Asha T., S. Natarajan, and K.N.B. Murthy, Association rule based tuberculosis disease diagnosis, Proceedings of International Conference on Digital Image Processing (ICDIP’2010) February 26-28, 2010 SPIE, Singapore, 7546, 75462Y1-6.

Parameshvyas Laxminarayan, Sergio A. Alvarez, Carolina Ruiz, and Majaz Moonis, Mining statistically significant associations for exploratory analysis of human sleep data, IEEE Transactions on Information Technology in Biomedicine, (10) 3 (2006), pp. 440-450.

Carlos Ordonez, Edward Omiecinski, Cesar A. Santana, et al., Mining constrained association rules to predict heart diseases, Proc. ICDM, November, 2001, pp. 433-440.

Carlos Ordonez, Cesar A. Santana, Levien de Braal, Discovering interesting association rules in medical data, Proc. ACM DMKD, 2000, pp. 78-85.

B. Liu, W. Hsu, S. Chen, Y. Ma, Analyzing the Subjective Interestingness of Association Rules, IEEE Intelligent Systems, 2000.

M. Pei, E.D. Goodman, W.F. Punch, Feature extraction using genetic algorithm, Proceedings of International Symposium on Intelligent Data Engineering and Learning (IDEAL’98), 1997.

Sufal Das, Bhabesh Nath, Dimensionality reduction using association rule mining, IEEE Region 10 Colloquium and Third International Conference on Industrial and Information Systems (ICIIS 2008), December 8-10, 2008, IIT Kharagpur, India.

M. Anandhavalli, Suraj Kumar Sudhanshu, Ayush Kumar and M.K. Ghose, Optimized association rule mining using genetic algorithm, Advances in Information Mining, (1) 2 (2009), pp. 01-04.

David Beasley et al., An overview of genetic algorithms, Part 1 & 2, University Computing, (15) 2 & 4 (1993), pp. 58-69 & pp. 170-181.

Kazuo Sugihara, Measures for performance evaluation of genetic algorithms, Proceedings of 3rd Joint Conference on Information Sciences (JCIS’97), 1997, pp. 172-175.

S.M Khalessizadeh, R. Zaefarian, S.H. Nasseri, and E. Ardil, Genetic mining: using genetic algorithm for topic based on concept distribution, Transactions on Engineering Computing and Technology, Vol. 13, World Enformatika Society, May 2006, pp. 44-147.

Mehmet Kaya, Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules, Soft Computing, 2006, pp. 578-586.

J. Han and M. Kamber, Data Mining: Concepts and Techniques, II edition, Morgan Kaufmann Publishers, San Francisco, 2006.

Ian H. Witten and Eibe Frank, Data Mining Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, 2001.

A.K. Pujari, Data Mining Techniques, Universities Press, 2001.

Carlos Ordonez, Association rule discovery with the train and test approach for heart disease prediction, IEEE Transactions on Information Technology in Biomedicine, (10) 2 (2006), pp. 334-343.

T.J. Chen, L.F. Chou and S.J. Hwang, Application of a data mining technique to analyze coprescription patterns for antacids in Taiwan, Clin. Ther., (25) 9 (2003), pp. 2453–2463.

Xiaowei Yan, Chengqi Zhang and Shichao Zhang, Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support, Expert Systems with Applications, Vol. 36, 2009, pp. 3066-3076.

Xiaowei Yan, Chengqi Zhang, and Shichao Zhang, ARMGA: identifying interesting association rules with genetic algorithms, Taylor & Francis, Vol. 19, 2005, pp. 677-689.

A.J. Christian, G.P. Martin, Optimization of association rules with genetic algorithm, Proceedings of XXIX IEEE International Conference of Chilean Computer Science Society (SCCC 2010), USA.

Ashish Ghosh, S. Dehuri and S. Ghosh, Multiobjective Evolutionary Algorithms for Knowledge Discovery from Databases, Book series, “Studies in Computational Intelligence”, Vol. 98, Springer, 2008.

Downloads

Published

2014-08-01

How to Cite

Asha, T., Natarajan, S., & Murthy, K. (2014). OPTIMIZATION OF ASSOCIATION RULES FOR TUBERCULOSIS USING GENETIC ALGORITHM. International Journal of Computing, 12(2), 151-159. https://doi.org/10.47839/ijc.12.2.596

Issue

Section

Articles