HETEROGENEOUS COMPUTING TO ACCELERATE THE SEARCH OF SUPER K-MERS BASED ON MINIMIZERS
Keywords:Heterogeneous computing, K-mers processing, OpenCL, Parallel programming, Search and distribution of super k-mers
The k-mers processing techniques based on partitioning of the data set on the disk using minimizer-type seeds have led to a significant reduction in memory requirements; however, it has added processes (search and distribution of super k-mers) that can be intensive given the large volume of data. This paper presents a massive parallel processing model in order to enable the efficient use of heterogeneous computation to accelerate the search of super k-mers based on seeds (minimizers or signatures). The model includes three main contributions: a new data structure called CISK for representing the super k-mers, their minimizers and two massive parallelization patterns in an indexed and compact way: one for obtaining the canonical m-mers of a set of reads and another for searching for super k-mers based on minimizers. The model was implemented through two OpenCL kernels. The evaluation of the kernels shows favorable results in terms of execution times and memory requirements to use the model for constructing heterogeneous solutions with simultaneous execution (workload distribution), which perform co-processing using the current search methods of super k -mers on the CPU and the methods presented herein on GPU. The model implementation code is available in the repository: https://github.com/BioinfUD/K-mersCL.
H. Li, A. Ramachandran, and D. Chen, “GPU acceleration of advanced k-mer counting for computational genomics,” Proceedings of the 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2018, pp. 183-186.
T. Richert, “Management of distributed dynamic data with algorithmic skeletons,” Parallel Computing, 2000, pp. 375-382. https://www.worldscientific.com/doi/abs/10.1142/9781848160170_0044
F. Wrede and S. Ernsting, “Simultaneous CPU–GPU execution of data parallel algorithmic skeletons,” International Journal of Parallel Programming, vol. 46, no. 1, pp. 42–61, Apr. 2017.
M. Roberts, W. Hayes, B. R. Hunt, S. M. Mount, and J. A. Yorke, “Reducing storage requirements for biological sequence comparison,” Bioinformatics, vol. 20, no. 18, pp. 3363–3369, 2004.
G. Marçais, D. Pellow, D. Bork, Y. Orenstein, R. Shamir, and C. Kingsford, “Improving the performance of minimizers and winnowing schemes,” Bioinformatics, vol. 33, no. 14, pp. i110–i117, Dec. 2017.
S. Altschul, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, Jan. 1997.
N. Pérez, M. Gutierrez, and N. Vera, “Computational performance assessment of k-mer counting algorithms,” Journal of Computational Biology, vol. 23, no. 4, pp. 248–255, 2016.
M. Kokot, M. Długosz, and S. Deorowicz, “KMC 3: counting and manipulating k-mer statistics,” Bioinformatics, vol. 33, no. 17, pp. 2759–2761, Apr. 2017.
Y. Li, et al. MSPKmerCounter: a fast and memory efficient approach for k-mer counting. arXiv preprint arXiv:1505.06550, 2015.
Y. Li, P. Kamousi, F. Han, S. Yang, X. Yan, and S. Suri, “Memory efficient minimum substring partitioning,” Proceedings of the VLDB Endowment, vol. 6, no. 3, pp. 169–180, Jan. 2013.
R. Chikhi, A. Limasset, and P. Medvedev, “Compacting de Bruijn graphs from sequencing data quickly and in low memory,” Bioinformatics, vol. 32, no. 12, pp. i201–i208, 2016.
C. Marchet, M. Kerbiriou, and A. Limasset, “Indexing De Bruijn graphs with minimizers,” Nov. 2019. https://www.biorxiv.org/content/10.1101/546309v2
J. Luo, J. Wang, W. Li, Z. Zhang, F. X. Wu, M. Li, & Y. Pan, “EPGA2: memory-efficient de novo assembler,” Bioinformatics, vol. 31, no. 24, pp. 3988-3990, 2015.
S. Deorowicz, “FQSqueezer: k-mer-based compression of sequencing data,” Scientific Reports, vol. 10, 578, 2020. https://doi.org/10.1038/s41598-020-57452-6
S. Deorowicz, A. Debudaj-Grabysz, and S. Grabowski, “Disk-based k-mer counting on a PC,” BMC Bioinformatics, vol. 14, no. 1, 2013.
M. Erbert, S. Rechner, and M. Müller-Hannemann, “Gerbil: a fast and memory-efficient k-mer counter with GPU-support,” Algorithms for Molecular Biology, vol. 12, no. 1, 2017.
Y. Ono, K. Asai, and M. Hamada, “PBSIM: PacBio reads simulator—toward accurate genome assembly,” Bioinformatics, vol. 29, no. 1, pp. 119–121, Apr. 2012.
A. Rhoads and K. F. Au, “PacBio Sequencing and Its Applications,” Genomics, Proteomics & Bioinformatics, vol. 13, no. 5, pp. 278–289, 2015.
V. Alessandrini, “Atomic types and operations,” Shared Memory Application Programming, pp. 167–190, 2016.
J. E. Stone, D. Gohara, and G. Shi, “OpenCL: A parallel programming standard for heterogeneous computing systems,” Computing in Science & Engineering, vol. 12, no. 3, pp. 66–73, 2010.
N. Vera, C. Rojas and J. Pérez, OpenCL Práctico, first ed., Editorial UD, Bogotá, 2019, 314 p.
A. Klöckner, N. Pinto, Y. Lee, B. Catanzaro, P. Ivanov, and A. Fasih, “PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation,” Parallel Computing, vol. 38, no. 3, pp. 157–174, 2012.
A. Klöckner, N. Pinto, B. Catanzaro, Y. Lee, P. Ivanov, and A. Fasih, “GPU scripting and code generation with PyCUDA,” GPU Computing Gems Jade Edition, pp. 373–385, 2012.
G. Marçais, C. Kingsford, “A fast, lock-free approach for efficient parallel counting of occurrences of k-mers,” Bioinformatics, vol. 27, no 6, pp. 764-770, 2011.
S. Deorowicz, A. Debudaj-Grabysz, S. Grabowsky, “Disk-based k-mer counting on a PC,” BMC Bioinformatics, vol. 14, no. 1, p. 160, 2013.
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.