Association Rule Mining in SQL to Improve Demand Forecasting using LSTM
Keywords:
Business Intelligence, Market Basket Analysis, Association Rule Mining, Apriori, Eclat, SQL, Forecasting, Demand Forecasting, Long Short-Term Memory, MultivariateAbstract
Association Rule Mining (ARM) and demand forecasting are vital in business intelligence, especially in commerce. ARM algorithms can be computationally expensive and require data migration to environments like R or Python. Meanwhile, demand forecasting often needs separate models for each product and is sensitive to data quality. To address ARM challenges, we implemented Apriori and Eclat algorithms, well-known itemsets generation algorithms, along with a rules generation algorithm in PostgreSQL. Additionally, we developed multivariate Long Short-Term Memory (LSTM) models with grouped data based on itemsets generation. Evaluation on the Online Retail II dataset (~one million rows) showed that our SQL ARM implementation achieved low overhead time (3.3s) and acceptable processing times (Eclat: 3s, rules generation: 0.1s). Our method is comparable to the state-of-the-art R implementations (overhead: 4.4s, Eclat: 0.66s, rules generation: 0.4s). For demand forecasting, the multivariate LSTM with grouped data reduced training time from 280.1s to 122.1s, improved Mean Squared Error from 1.045 to 0.882, and Mean Absolute Error from 0.471 to 0.433.
References
D. Jangam and A. R. Deshpande, “Business analytics using predictive algorithms,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 11, issue 8s, pp. 595–609 2023, https://doi.org/10.17762/ijritcc.v11i8s.7242.
A. Gunasekaran et al., “Big data and predictive analytics for supply chain and organizational performance,” Journal of Business Research, vol. 70, pp. 308–317, 2017, https://doi.org/10.1016/j.jbusres.2016.08.004.
B. Tierney, Predictive Analytics using Oracle Data Miner: Develop & use Data Mining Models in ORACLE DATA MINER, SQL & PL/SQL. New York: McGraw-Hill Education, 2014.
B. Christian and K. Rudolf, “Induction of association rules: Apriori implementation,” Proceedings of the Compstat 2002, Berlin, Germany: Physica, Heidelberg, 2002, pp. 395–400. https://doi.org/10.1007/978-3-642-57489-4_59.
R. Agrawal, R. Srikant, “Fast algorithms for mining association rules,” Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Santiago, Chile, September 1994, pp. 487–499. [Online]. Available at: http://www.vldb.org/conf/1994/P487.PDF.
H. Jiawei and P. Jian, “Mining frequent patterns without candidate generation,” ACM SIGMOD Record, vol. 29, no. 2, pp. 1–12, 2000, https://doi.org/10.1145/335191.335372.
M. J. Zaki, “Scalable algorithms for association mining,” IEEE Trans. Knowl. Data Eng., vol. 12, no. 3, pp. 372–390, 2000, https://doi.org/10.1109/69.846291.
S. M. Hamdani, “Application of association rule method using apriori algorithm to find sales patterns case study of indomaret tanjung anom,” Brilliance: Research of Artificial Intelligence, vol. 1, no. November 2021, pp. 54–66, 2021, https://doi.org/10.47709/brilliance.v1i2.1228.
S. Fajrianti, O. M. Prabowo, R. R. Nurmalasari, and R. Ramdani, “Application of the association rule in online stores using apriori algorithm for product recommendations at promotional events,” Proceedings of the 2023 9th International Conference on Wireless and Telematics (ICWT), Jul. 2023, pp. 1–5. https://doi.org/10.1109/ICWT58823.2023.10335465.
V. A. Hameed, M. E. Rana, and L. H. Enn, “Apriori algorithm based association rule mining to enhance small-scale retailer sales,” Proceedings of the 2023 IEEE 6th International Conference on Big Data and Artificial Intelligence (BDAI), Jul. 2023, pp. 187–191. https://doi.org/10.1109/BDAI59165.2023.10256952.
S. Y. K. Sipahutar, A. A. Panjaitan, D. P. Sitanggang, and I. Fitriyaningsih, “Implementation of association rules with apriori algorithm in determining customer purchase patterns,” Proceedings of the 2022 IEEE International Conference of Computer Science and Information Technology (ICOSNIKOM), Oct. 2022, pp. 1–6. https://doi.org/10.1109/ICOSNIKOM56551.2022.10034921.
L. Wang, Y. Guo, and Y. Guo, “An improved eclat algorithm based association rules mining method for failure status information and remanufacturing machining schemes of retired products,” Proceedings of the 16th CIRP Conference on Intelligent Computation in Manufacturing Engineering, CIRP ICME'22, Italy: Procedia CIRP, 2023, pp. 572–577. https://doi.org/10.1016/j.procir.2023.06.098.
C. Borgelt, “An implementation of the FP-growth algorithm,” Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, 2010, pp. 1-5. https://doi.org/10.1145/1133905.1133907.
G. G., “Efficiently Using Prefix-trees in Mining Frequent Itemsets,” presented at the FIMI’03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA, 2003. [Online]. Available at: https://www.researchgate.net/publication/220845998_Efficiently_Using_Prefix-trees_in_Mining_Frequent_Itemsets.
M. Hahsler, B. Grün, and K. Hornik, “Arules - A Computational Environment for Mining Association Rules and Frequent Item Sets,” J. Stat. Soft., vol. 14, no. 15, 2005, https://doi.org/10.18637/jss.v014.i15.
S. Raschka, “MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack,” JOSS, vol. 3, no. 24, p. 638, 2018, https://doi.org/10.21105/joss.00638.
M. Blacher, J. Giesen, S. Laue, J. Klaus, and V. Leis, “Machine learning, linear algebra, and more: Is SQL all you need?,” Proceedings of the Conference on innovative data systems research, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:249872850.
M. Blacher, J. Klaus, C. Staudt, S. Laue, V. Leis, and J. Giesen, “Efficient and portable Einstein summation in SQL,” Proc. ACM Manag. Data, vol. 1, no. 2, p. 121:1-121:19, 2023, https://doi.org/10.1145/3589266.
M. Schule, H. Lang, M. Springer, A. Kemper, T. Neumann, and S. Gunnemann, “In-database machine learning with SQL on GPUs,” in Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, in SSDBM ’21. New York, NY, USA: Association for Computing Machinery, Aug. 2021, pp. 25–36. https://doi.org/10.1145/3468791.3468840.
I. H. Sarker, “Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions,” SN COMPUT. SCI., vol. 2, no. 6, p. 420, 2021, https://doi.org/10.1007/s42979-021-00815-1.
K. K. Chandriah and R. V. Naraganahalli, “RNN / LSTM with modified Adam optimizer in deep learning approach for automobile spare parts demand forecasting,” Multimed Tools Appl, vol. 80, no. 17, pp. 26145–26159, 2021, https://doi.org/10.1007/s11042-021-10913-0.
Z. Li and N. Zhang, “Short-term demand forecast of e-commerce platform based on ConvLSTM network,” Computational Intelligence and Neuroscience, vol. 2022, no. 1, p. 5227829, 2022, https://doi.org/10.1155/2022/5227829.
F. Taha, D. Farzaneh, B. Patrick, and U. Chibuzor, “Predictive analytics for demand forecasting – A comparison of SARIMA and LSTM in retail SCM,” Procedia Computer Science, vol. 200, no. 2022, pp. 993–1003, 2022, https://doi.org/10.1016/j.procs.2022.01.298.
X. Zhang, P. Li, X. Han, Y. Yang, and Y. Cui, “Enhancing time series product demand forecasting with hybrid attention-based deep learning models,” IEEE Access, vol. 12, pp. 190079–190091, 2024, https://doi.org/10.1109/ACCESS.2024.3516697.
C. Chatfield, Time-Series Forecasting (1st ed.), 1st ed. Chapman and Hall/CRC, 2000. https://doi.org/10.1201/9781420036206.
P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” Neurocomputing, vol. 50, no. 17, pp. 159–175, 2003, https://doi.org/10.1016/S0925-2312(01)00702-0.
J. Robin and G. Athanasopoulos, Forecasting: Principles and Practice, 2nd ed. OTexts, 2018. [Online]. Available at: https://otexts.org/fpp2/.
B. Nguyen-Thai, V. Le, N.-D. T. Tieu, T. Tran, S. Venkatesh, and N. Ramzan, “Learning evolving relations for multivariate time series forecasting,” Appl Intell, vol. 54, no. 5, pp. 3918–3932, 2024, https://doi.org/10.1007/s10489-023-05220-0.
Q. Zhao, G. Yang, K. Zhao, J. Yin, W. Rao, and L. Chen, “Multivariate time-series forecasting model: Predictability analysis and empirical study,” IEEE Trans. Big Data, vol. 9, no. 6, pp. 1536–1548, 2023, https://doi.org/10.1109/TBDATA.2023.3288693.
Daqing Chen, “Online Retail II.” UCI Machine Learning Repository, 2012. https://doi.org/10.24432/C5CG6D.
K. Dahdouh, A. Dakkak, L. Oughdir, and A. Ibriz, “Association rules mining method of big data for e-learning recommendation engine,” Proceedings of the Advanced Intelligent Systems for Sustainable Development (AI2SD’2018), vol. 915, M. Ezziyyani, Ed., in Advances in Intelligent Systems and Computing, vol. 915., Cham: Springer International Publishing, 2019, pp. 477–491. https://doi.org/10.1007/978-3-030-11928-7_43.
Z. Zhao, Z. Jian, G. S. Gaba, R. Alroobaea, M. Masud, and S. Rubaiee, “An improved association rule mining algorithm for large data,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 750–762, 2021, https://doi.org/10.1515/jisys-2020-0121.
H. Lan, X. Ma, L. Ma, and W. Qiao, “Pattern investigation of total loss maritime accidents based on association rule mining,” Reliability Engineering & System Safety, vol. 229, p. 108893, 2023, https://doi.org/10.1016/j.ress.2022.108893.
X. Wang, X. Huang, Y. Zhang, X. Pan, and K. Sheng, “A data-driven approach based on historical hazard records for supporting risk analysis in complex workplaces,” Mathematical Problems in Engineering, vol. 2021, pp. 1–15, 2021, https://doi.org/10.1155/2021/3628156.
J. R. Dias, jeffrichardchemistry/pyECLAT: pyECLAT. (Sep. 08, 2020). Zenodo. https://doi.org/10.5281/ZENODO.4019037.
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Jan. 29, 2017, arXiv: arXiv:1412.6980. doi: 10.48550/arXiv.1412.6980.
C. R. Harris et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, 2020, https://doi.org/10.1038/s41586-020-2649-2.
The pandas development team, pandas-dev/pandas: Pandas. (Apr. 10, 2024). Zenodo. https://doi.org/10.5281/ZENODO.3509134.
TensorFlow Developers, TensorFlow. (Jun. 18, 2024). Zenodo. https://doi.org/10.5281/ZENODO.4724125.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.