Reasoning Mechanism in Multimodal AI Models based on the TRIZ Principles
DOI:
https://doi.org/10.47839/ijc.24.1.3876Keywords:
multimodal models, artificial intelligence, reasoning mechanism, cognitive frameworks, multimodal analytics TRIZ principleAbstract
This paper investigates the reasoning mechanisms of multimodal AI models through the lens of TRIZ (Theory of Inventive Problem Solving) principles. Multimodal AI, which integrates and processes information from multiple data types such as text, images, and audio, has seen significant advancements. However, its reasoning capabilities remain a challenging frontier, particularly in harmonizing diverse modalities to achieve coherent outputs. By applying TRIZ, a systematic methodology widely used in engineering and innovation, we explore how these models address conflicts inherent in multimodal data fusion and reasoning. We identify key TRIZ principles such as Contradiction Resolution, the System of Systems approach, and the Concept of Ideality. We map these to the challenges and mechanisms of current multimodal AI systems. Our analysis highlights how models employ inventive principles to resolve contradictions, such as balancing accuracy across modalities or reconciling disparate representations. We also propose a novel framework inspired by TRIZ for enhancing reasoning in multimodal AI, emphasizing adaptability, scalability, and resource efficiency. This study contributes to a deeper understanding of multimodal reasoning and offers actionable insights for designing more robust and efficient AI systems. By leveraging TRIZ principles, we aim to foster innovative approaches to complex problem-solving in AI, bridging the gap between theoretical understanding and practical application.
References
G. Altshuller, 40 Principles: TRIZ Keys to Innovation, Technical Innovation Center, Inc., 2005. ISBN 0964074036.
S. D. Bushuyev and A. V. Ivko, "Construction of models and application of syncretic innovation project management in the era of artificial intelligence," Eastern-European Journal of Enterprise Technologies, vol. 3, no. 3, pp. 44–54, 2024. [Online]. Available: https://doi.org/10.15587/1729-4061.2024.306436.
W. Kim, B. Son, and I. Kim, "ViLT: Vision-and-language transformer without convolution or region supervision," in Proc. 38th Int. Conf. Machine Learning (ICML), 2021. [Online]. Available: https://doi.org/10.48550/arXiv.2102.03334.
A. Radford, J. W. Kim, C. Hallacy, et al., "Learning transferable visual models from natural language supervision," in Proc. 38th Int. Conf. Machine Learning (ICML), 2021. [Online]. Available: https://doi.org/10.48550/arXiv.2103.00020.
A. Singh, V. Goswami, C. Agarwal, et al., "FLAVA: A foundational language and vision alignment model," in Advances in Neural Information Processing Systems (NeurIPS), 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2112.04482.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015. [Online]. Available: https://doi.org/10.1109/CVPR.2015.7298935.
J. Xie, H. Liu, W. Zhang, et al., "Explainable multimodal medical diagnosis with knowledge graphs," IEEE Trans. Med. Imaging, vol. 39, no. 12, pp. 4092–4102, 2020. https://doi.org/10.3389/fbioe.2020.00867.
K. Yi, J. Wu, C. Gan, et al., "Neural-symbolic VQA: Disentangling reasoning from vision and language understanding," in Advances in Neural Information Processing Systems (NeurIPS), 2018. [Online]. Available: https://doi.org/10.48550/arXiv.1810.02338.
N. Rodis, C. Sardianos, G. Papadopoulos, P. Radoglou-Grammatikis, P. Sarigiannidis, and I. Varlamis, "Multimodal explainable artificial intelligence: A comprehensive review of methodological advances and future research directions," ArXiv, abs/2306.05731, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.05731.
P. Lu, S. Mishra, T. Xia, L. Qiu, K. Chang, S. Zhu, O. Tafjord, P. Clark, and A. Kalyan, "Learn to explain: Multimodal reasoning via thought chains for science question answering," ArXiv, abs/2209.09513, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.09513.
P. Lu, L. Qiu, K. Chang, Y. Wu, S. Zhu, T. Rajpurohit, P. Clark, and A. Kalyan, "Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning," ArXiv, abs/2209.14610, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.14610.
P. Lu, L. Qiu, W. Yu, S. Welleck, and K. Chang, "A survey of deep learning for mathematical reasoning," ArXiv, abs/2212.10535, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2212.10535.
P. Lu, B. Peng, H. Cheng, M. Galley, K. Chang, Y. Wu, S. Zhu, and J. Gao, "Chameleon: Plug-and-play compositional reasoning with large language models," ArXiv, abs/2304.09842, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.09842.
J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. Chi, F. Xia, Q. Le, and D. Zhou, "Chain of thought prompting elicits reasoning in large language models," ArXiv, abs/2201.11903, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2201.11903.
P. Lu, H. Bansal, T. Xia, J. Liu, C. Li, H. Hajishirzi, H. Cheng, K. Chang, M. Galley, and J. Gao, "MathVista: Evaluating mathematical reasoning of foundation models in visual contexts," 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.02255.
L. Sun, Y. Han, Z. Zhao, Z. Shen, B. Chen, L. Chen, and K. Yu, "SciEval: A multi-level large language model evaluation benchmark for scientific research," ArXiv, abs/2308.13149, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.13149.
T. Kojima, S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large language models are zero-shot reasoners," ArXiv, abs/2205.11916, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2205.11916.
E. Zelikman, Y. Wu, and N. Goodman, "STaR: Bootstrapping reasoning with reasoning," 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2203.14465.
A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, Y. Wu, B. Neyshabur, G. Gur-Ari, and V. Misra, "Solving quantitative reasoning problems with language models," ArXiv, abs/2206.14858, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.14858.
D. Jain, A. Rahate, G. Joshi, R. Walambe, and K. Kotecha, "Employing Co-Learning to Evaluate the Explainability of Multimodal Sentiment Analysis," IEEE Trans. Comput. Soc. Syst., vol. 11, pp. 4673-4680, 2024. [Online]. Available: https://doi.org/10.1109/TCSS.2022.3176403.
F. Zhao, C. Zhang, and B. Geng, "Deep Multimodal Data Fusion," ACM Comput. Surv., vol. 56, pp. 1-36, 2024. [Online]. Available: https://doi.org/10.1145/3649447.
Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, "Multimodal Chain-of-Thought Reasoning in Language Models," Trans. Mach. Learn. Res., 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2302.00923.
W. Shin, S. Lee, and H. Sue, "Fostering Tech Innovation," Tehnički glasnik, 2024. [Online]. Available: https://doi.org/10.31803/tg-20231212081808.
L. Gaikwad, V. Sunnapwar, and S. Teli, "Adaption of TRIZ method for problem-solving: a case study," Int. J. Six Sigma Competitive Adv., vol. 10, p. 146, 2016. [Online]. Available: https://doi.org/10.1504/IJSSCA.2016.10001733.
I. Belski, "TRIZ thinking heuristics to nurture future generations of creative engineers," Australas. J. Eng. Educ., vol. 24, pp. 86–97, 2019. [Online]. Available: https://doi.org/10.1080/22054952.2019.1699493.
O. Dunets, C. Wolff, A. Sachenko, G. Hladiy and I. Dobrotvor, "Multi-agent system of IT project planning," 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Bucharest, Romania, 2017, pp. 548-552, https://doi.org/10.1109/IDAACS.2017.8095141
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.