ASSOCIATION RULES MINING IN BIG DATA
Keywords:Big data, association rule, data dependency, Apriori, Complexity, parallel processing.
AbstractThe paper proposes a method for Big data analyzing in the presence of different data sources and different methods of processing these data. The Big data definition is given, the main problems of data mining process are described. The concept of association rules is introduced and the method of association rules searching for working with Big Data is modified. The method of finding dependencies is developed, efficiency and possibility of its parallelization are determined. The developed algorithm makes it possible to assert that the task of detecting association dependencies in distributed databases belongs to the class of P-tasks. The algorithm for finding association dependencies is well-solved with MapReduce. The low asymptotic complexity of the developed association rules mining algorithm and a wide set of data types supported for analysis allow to apply the proposed algorithm in practically all subject areas working with association dependencies in the data domain.
N. Schahovska, “Datawarehouse and dataspace – information base of decision support system,” in Proceedings of the IEEE 11th International Conference on CAD Systems in Microelectronics (CADSM’2011), 2011.
N. Shakhovska, M. Medykovsky, P. Stakhiv, “Application of algorithms of classification for uncertainty reduction,” Przeglad Elektrotechniczny, vol. 89, no. 4, pp. 284-286, 2013.
M. J. Zaki, “Scalable algorithms for association mining,” IEEE Transactions on Knowledge and Data Engineering, vol. 12, issue 3, pp. 372-390, 2000.
J. Han, J. Pei, Y. Yin, “Mining frequent patterns without candidate generation,” in ACM Sigmod Record, pp. 1-12, 2000.
J. Woo, “Apriori-Map/Reduce algorithm,” in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), 2012, pp. 1.
X. Y. Yang, Z. Liu, Y. Fu, “MapReduce as a programming model for association rules algorithm on Hadoop,” in Proceedings of the IEEE 3rd International Conference on Information Sciences and Interaction Sciences (ICIS’2010), 2010, pp. 99-102.
R. Agrawal, T. Imieliński, A. Swami, “Mining association rules between sets of items in large databases,” in ACM Sigmod Record, pp. 207-216, 1993.
О. Yu. Pshenychnyj, “Data dependencies mining,” Mathematical Machines and Systems, vol. 1, no. 1, 2012. (in Ukrainian).
M. Delgado, M. D. Ruiz, & D. Sánchez, “New approaches for discovering exception and anomalous rules,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 19, issue 2, pp. 361–399, 2011.
M. Hahsler, C. Buchta, B. Grün, K. Hornik, arules: Mining Association Rules and Frequent Itemsets. R package version 1.0-3., 2010, [Online]. Available: http://CRAN.R-project.org/.
F. Berzal, et al., “A new framework to assess association rules,” in Advances in Intelligent Data Analysis, Springer Berlin: Heidelberg, pp. 95–104, 2001.
E. Hüllermeier, “Association rules for expressing gradual dependencies,” in Principles of Data Mining and Knowledge Discovery, Springer, Berlin: Heidelberg, pp. 200–211, 2002.
H. Srivastava, V. Kumar, S. Shiwani, “An efficient enhancement of mining top-K association rule,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 4, issue 6, June 2014.
D. Hunyadi, “Performance comparison of Apriori and FP-Growth algorithms in generating association rules,” in Proceedings of the European Computing Conference, 2011, pp. 376-381.
A. O. Ogunde, O. Folorunso, A. S. Sodiya, “A partition enhanced mining algorithm for distributed association rule mining systems,” Egyptian Informatics Journal, vol. 16, no. 3, pp. 297-307, 2015.
R. Porkodi, B.L Shivakumar, “An improved association rule mining technique for xml data using Xquery and Apriori algorithm,” pp. 1510-1514, March 2009.
S. Rao, P. Gupta, “Implementing improved algorithm over Apriori data mining association rule algorithm”, IJCST, vol. 3, pp. 489-493, 2012.
V. K. Shrivastava, P. Kumar, K. R. Pardasani, “FP-tree and COFI based approach for mining of multiple level association rules in large databases,” arXiv preprint arXiv:1003.1821, 2010.
K. Khurana, and S. Sharma, “A comparative analysis of association rule mining algorithms,” International Journal of Scientific and Research Publications, vol. 3, issue 5, May 2013.
N. Shakhovska, “Consolidated processing for differential information products,” in Proceedings of the IEEE VIIth International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH’2011), 2011.
J. Chen, D. Dosyn, V. Lytvyn, A. Sachenko, “Smart data integration by goal driven ontology learning,” in Advances in Big Data. Proceedings of the 2nd INNS Conference on Big Data, Thessaloniki, Greece, October 23-25, 2016, pp. 283-292.
I. Perova, Y. Bodyanskiy, “Fast medical diagnostics using autoassociative neuro-fuzzy memory,” International Journal of Computing, vol. 16, issue 1, pp. 34-40, 2017. Retrieved from http://computingonline.net/computing/article/view/869.
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.