A FRAMEWORK FOR INCREMENTAL PARALLEL MINING OF INTERESTING ASSOCIATION PATTERNS FOR BIG DATA
Keywords:Big Data Mining, Association Pattern Mining, Parallel Mining, Incremental Mining, Interesting, Measure Novelty Measure, KDD.
AbstractAssociation rule mining plays a very important role in the distributed environment for Big Data analysis. The massive volume of data creates imminent needs to design novel, parallel and incremental algorithms for the association rule mining in order to handle Big Data. In this paper, a framework is proposed for incremental parallel interesting association rule mining algorithm for Big Data. The proposed framework incorporates interestingness measures during the process of mining. The proposed framework works to process the incremental data, which usually comes at different times, the user's important knowledge is explored by processing of new data only, without having to return from scratch. One of the main features of this framework is to consider the user domain knowledge, which is monotonically increased. The model that incorporates the users’ belief during the extraction of patterns is attractive, effective and efficient. The proposed framework is implemented on public datasets as well as it is evaluated based on the interesting results that are found.
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, et al., Big data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey Global Institute, June 2011, pp. 156.
A. Kejariwal, “Big data challenges: a program optimization perspective,” Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012, pp. 702-707.
S. Kaisler, F. Armour, J. A. Espinosa, and W. Money, “Big data: Issues and challenges moving forward,” Proceedings of the 2013 46th Hawaii International Conference on System Sciences, 2013, pp. 995-1004.
S. Moens, E. Aksehirli, and B. Goethals, “Frequent itemset mining for big data,” Proceedings of the 2013 IEEE International Conference on Big Data, 2013, pp. 111-118.
R. Agrawal, T. Imieliski, and A. Swami, “Mining association rules between sets of items in large databases,” Proceedings of the 1993 ACM SIGMOD Conference, 1993, pp. 207-216.
J. Park, M. Chen, and P. Yu, “Efficient parallel data mining for association rules,” Proceedings of the fourth International Conference on Information and Knowledge Management CIKM’95, 1995, pp. 31-36
O.R. Zaane, M. El-Hajj, and P. Lu, “Fast parallel association rule mining without candidacy generation,” Proceedings of the 2001 IEEE International Conference on Data Mining, 2001, pp. 665-668.
H. Li, Y. Wang, D. Zhang, M. Zhang, and E.Y. Chang, “Pfp: parallel fp-growth for query recommendation,” Proceedings of the 2008 ACM Conference on Recommender Systems, 2008, pp. 107-114.
L. Liu, E. Li, Y. Zhang, and Z. Tang, “Optimization of frequent itemset mining on multiple-core processor,” Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 1275–1285.
V. Bhatnagar, A. S. Al-Hegami, and N. Kumar, “A hybrid approach for quantification of novelty in rule discovery,” Proceedings of the WEC, vol. 2, 2005, pp. 39-42.
V. Bhatnagar, A.S. Al-Hegami, and N. Kumar, “Novelty as a measure of interestingness in knowledge discovery,” International Journal of Information Technology, vol. 2, no. 1, pp. 36-41, 2005.
A.S. Al-Hegami, V. Bhatnagar, and N. Kumar, “Novelty framework for knowledge discovery in databases,” Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, 2004, pp. 48-57.
E. Yafi, A.S. Al-Hegami, M.A. Alam, and R. Biswas, “YAMI: incremental mining of interesting association patterns,” Int. Arab J. Inf. Technol., vol. 9, pp. 504-510, 2012.
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proceedings of the 20th Int. Conf. on Very Large Data Bases, VLDB, 1994, pp. 487-499.
J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” Proceedings of the ACM SIGMOD Conference, 2000, pp. 1-12.
A. Pradeepa and A. Thanamani, “Parallelized comprising for apriori algorithm using mapreduce framework,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, pp. 4365-4368, 2013.
Y. Xun, J. Zhang, and X. Qin, “Fidoop: Parallel mining of frequent itemsets using mapreduce,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 46, pp. 313-325, 2016.
J. M. Kunkel, “Simulating parallel programs on application and system level,” Computer Science-Research and Development, vol. 28, pp. 167-174, 2013.
Z. Zeng, C. Yang, and Y. Tao, “Research of load balance FP-growth algorithm in parallel,” Computer Engineering and Applications, vol. 46, pp. 125-126, 2010.
J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, pp. 107-113, 2008.
D.W. Cheung, J. Han, V.T. Ng, and C. Wong, “Maintenance of discovered association rules in large databases: An incremental updating technique,” Proceedings of the Twelfth International Conference on Data Engineering, 1996, pp. 106-114.
D.W.-L. Cheung, V.T. Ng, and B.W. Tam, “Maintenance of discovered knowledge: A case in multi-level association rules,” Proceedings of the KDD, 1996, pp. 307-310.
D.W. Cheung, S.D. Lee, and B. Kao, “A general incremental technique for maintaining discovered association rules,” Proceedings of the Conference on Database Systems for Advanced Applications’97, ed: World Scientific, 1997, pp. 185-194.
V. Ganti and R. Ramakrishnan, “Mining and monitoring evolving data,” in Handbook of Massive Data Sets, ed: Springer, 2002, pp. 593-642.
S.D. Lee and D.W.-L. Cheung, “Maintenance of discovered association rules: When to update?,” Proceedings of the DMKD, 1997, pp. 1-14.
X. Wei, Y. Ma, F. Zhang, M. Liu and W. Shen, “Incremental FP-Growth mining strategy for dynamic threshold value and database based on MapReduce,” Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Hsinchu, 2014, pp. 271-276. DOI: 10.1109/CSCWD.2014.6846854
M.J. Zaki and C.-J. Hsiao, “CHARM: An efficient algorithm for closed itemset mining,” Proceedings of the 2002 SIAM International Conference on Data Mining, 2002, pp. 457-473.
M. Riondato, J.A. DeBrabant, R. Fonseca, and E. Upfal, “PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce,” Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012, pp. 85-94.
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.