PARALLEL MINING OF LARGE MAXIMAL BICLIQUES USING ORDER PRESERVING GENERATORS
DOI:
https://doi.org/10.47839/ijc.8.3.691Keywords:
Data mining, Knowledge Discovery, Maximal Bicliques, Mining Methods.Abstract
In this paper, we propose a parallel algorithm for mining large maximal bicliques from graph datasets. We propose POP-MBC (Parallel Order Preserving Maximal BiClique mining algorithm), a fast and memory efficient parallel algorithm, which enumerates all the maximal bicliques independently and concurrently across several processors without any synchronization between the processors. The POP-MBC algorithm is highly memory efficient since it does not store the previously computed patterns in the main memory and requires only the dataset to be stored in the memory. To enhance the load sharing among different nodes, POP-MBC uses a round robin strategy which enables to achieve load balancing as high as 90%. We have also incorporated bit-vectors and numerous optimization techniques exploiting the symmetric property of the graph dataset to reduce the memory consumption and overall running time of the algorithm. Our comp rehensive experimental analyses involving publicly available datasets show that our algorithm distributes the load among the different processors equally and takes less memory, less running time than other maximal biclique mining algorithms.References
Jinyan Li, Guimei Liu, Haiquan Li, Lim-soon Wong. Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: a one-to-one correspondence and mining algorithms. IEEE Transactions on Knowledge and Data Engineering, Dec. 2007, Vol. 19, No. 12, pp. 1625-1637.
Liping Ji, Kian-Lee Tan, K. H. Tung. Compressed hierarchical mining of frequent closed patterns from dense data sets. IEEE Trans. on Knowledge and Data Engineering, Sept 2007, Vol 19, No. 9.
C. Lucchese, S. Orlando and R. Perego. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, January 2006, Vol. 18, No. 1, p. 21-36.
G. Alexe, S. Alexe, Y. Crama, S. Foldes, P.L. Hammer and B. Simeone. Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics, 2004, 145(1), pp. 11-21.
Rene Peeters. The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics, 2003, 131(3), pp. 651-654.
V.M. Dias, C.M. de Figueiredo, and J. L. Szwarcfiter. Generating bicliques of a graph in lexicographic order. Journal of Theoretical Computer Science, 2005, Vol. 337, pp. 240-248.
K. Makino and T. Uno. New algorithms for enumerating all maximal cliques, in Proceedings of the 9th Scandinavian Workshop on Algorithm Theory (SWAT 2004), Springer-Verlag, 2004, pp. 260–272.
D. Eppstein. Arboricity and bipartite subgraph listing algorithms. Information processing letters, 1994, Vol. 51, pp. 207-211.
Mingjun Song, Sanguthevar Rajasekaran. A transaction mapping algorithm for frequent itemsets mining. IEEE Transactions on Knowledge and Data Engineering, April 2006, Vol. 18, No. 4, p. 472-481.
G. Grahne, J. Zhu. Fast algorithms for frequent itemset mining using FP-trees. IEEE Transactions on Knowledge and Data Engineering, October 2005, Vol. 17, No. 10, p. 1347-1362.
D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, Y. Yiu. MAFIA: a maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering, November 2005, Vol. 17, No. 11, p. 1490-1504.
S. Selvan and R. V. Nataraj. Efficient mining of maximal patterns using order preserving generators, in proc. 16th Intl. Conf. on Advanced Computing and Communications, Chennai, India, Dec. 2008.
J. Besson, C. Robardet, J.F. Boulicaut, S. Rome. Constraint based concept mining and its application to microarray data analysis. Journal of Intelligent Data Analysis, 2005, pp. 59-82.
Ji Liping. Mining localized co-expressed gene patterns from microarray data. PhD dissertation, School of Computing, National University of Singapore, June 2006.
Ji Liping, K.L. Tan and A.K.H. Tung. Mining frequent closed cubes in 3D datasets. Proc. 32nd int. conference on very large data-bases, 2006.
Gao Cong, Kian-Lee Tan, A.K.H. Tung, Feng Pan. Mining frequent closed patterns in microarray data, ICDM’04, Nov 2004, Vol. 1, Issue 4, p. 363-366.
Jiawei Han, Jian Pei, Yiwen Yin, Runying Mao. Mining frequent pattern without candidate generation: a frequent pattern approach. Journal of Data Mining and Knowledge Discovery, 2004, Springer, p. 53-87.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases, in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, May 1993, p. 207-216.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules, in Proceeding of Int. Conf. Very Large Data Bases, Santiago, Chile, September 1994, p. 487-499.
S. Brin, R. Motwani, J.D. Ullman and S. Tsur. Dynamic itemset counting and implication rules for market basket data, SIGMOD Record, June 1997, 6(2), p. 255-264.
R. Agrawal and J.C. Shafer. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, December 1996, Vol. 8, No. 6, p. 962-969.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules, Proc. 7th Int. Conf. Database Theory (ICDT’99), January 1999, p. 398-416.
J. Pei, J. Han, and R. Mao. CLOSET: an efficient algorithm for mining frequent closed itemsets, Proc. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DMKD’00), May 2000, p. 11-20.
J. Wang, J. Han and J. Pei. CLOSET+: searching for the best strategies for mining frequent closed itemsets, Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2003, p. 236-245.
M.J. Zaki, C.J. Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. on Knowledge and data Engineering, April 2005, Vol. 17, No. 4, p. 462-478.
Ahmed Shakil, Frans Coenen, Paul Leng. Tree based partitioning of data for association rule mining. Knowledge and Information Systems, Springer, October 2006, Vol. 10, No. 3, p. 315-331.
A. Veloso, M. Otey, S. Parthasarathy, and W. Meira. Parallel and distributed frequent itemset mining on dynamic datasets, in Proc. of the High Performance Computing Conference, HiPC, Hyderabad, India, December 2003.
Yew-kwong Woon, Wee-keong Ng, Ee-Peng Lim. A support-ordered tree for fast frequent itemset discovery. IEEE Transactions on Knowledge and Data Engineering, July 2004, Vol. 16, No. 7, p. 875-879.
H.D.K. Moonesinghe, Samah Fodeh, P.N. Tan. Frequent closed itemset mining using prefix graphs with an efficient flow-based pruning strategy. ICDM’06.
Guimei Liu. Supporting efficient and scalable frequent pattern mining. PhD Thesis, Hong Kong University, May 2005.
I. Rigoutsos and A. Floratos. Combinatorial pattern discovery in biological sequences: the teiresias algorithm. Bioinformatics, 1998, Vol. 14, p. 55-67.
Guizhen Yang. The complexity of mining maximal frequent itemsets and maximal frequent patterns, KDD’04, Seattle, Washington, August 2004.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.