Reasoning Mechanism in Multimodal AI Models based on the TRIZ Principles

Sergey D. Bushuyev; Natalia Bushuyeva; Andrii Pusiichuk; Denis Bushuiev; Yevgen Lobok

doi:10.47839/ijc.24.1.3876

Authors

Sergey D. Bushuyev
Natalia Bushuyeva
Andrii Pusiichuk
Denis Bushuiev
Yevgen Lobok

DOI:

https://doi.org/10.47839/ijc.24.1.3876

Keywords:

multimodal models, artificial intelligence, reasoning mechanism, cognitive frameworks, multimodal analytics TRIZ principle

Abstract

This paper investigates the reasoning mechanisms of multimodal AI models through the lens of TRIZ (Theory of Inventive Problem Solving) principles. Multimodal AI, which integrates and processes information from multiple data types such as text, images, and audio, has seen significant advancements. However, its reasoning capabilities remain a challenging frontier, particularly in harmonizing diverse modalities to achieve coherent outputs. By applying TRIZ, a systematic methodology widely used in engineering and innovation, we explore how these models address conflicts inherent in multimodal data fusion and reasoning. We identify key TRIZ principles such as Contradiction Resolution, the System of Systems approach, and the Concept of Ideality. We map these to the challenges and mechanisms of current multimodal AI systems. Our analysis highlights how models employ inventive principles to resolve contradictions, such as balancing accuracy across modalities or reconciling disparate representations. We also propose a novel framework inspired by TRIZ for enhancing reasoning in multimodal AI, emphasizing adaptability, scalability, and resource efficiency. This study contributes to a deeper understanding of multimodal reasoning and offers actionable insights for designing more robust and efficient AI systems. By leveraging TRIZ principles, we aim to foster innovative approaches to complex problem-solving in AI, bridging the gap between theoretical understanding and practical application.

References

G. Altshuller, 40 Principles: TRIZ Keys to Innovation, Technical Innovation Center, Inc., 2005. ISBN 0964074036.

S. D. Bushuyev and A. V. Ivko, "Construction of models and application of syncretic innovation project management in the era of artificial intelligence," Eastern-European Journal of Enterprise Technologies, vol. 3, no. 3, pp. 44–54, 2024. [Online]. Available: https://doi.org/10.15587/1729-4061.2024.306436.

W. Kim, B. Son, and I. Kim, "ViLT: Vision-and-language transformer without convolution or region supervision," in Proc. 38th Int. Conf. Machine Learning (ICML), 2021. [Online]. Available: https://doi.org/10.48550/arXiv.2102.03334.

A. Radford, J. W. Kim, C. Hallacy, et al., "Learning transferable visual models from natural language supervision," in Proc. 38th Int. Conf. Machine Learning (ICML), 2021. [Online]. Available: https://doi.org/10.48550/arXiv.2103.00020.

A. Singh, V. Goswami, C. Agarwal, et al., "FLAVA: A foundational language and vision alignment model," in Advances in Neural Information Processing Systems (NeurIPS), 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2112.04482.

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015. [Online]. Available: https://doi.org/10.1109/CVPR.2015.7298935.

J. Xie, H. Liu, W. Zhang, et al., "Explainable multimodal medical diagnosis with knowledge graphs," IEEE Trans. Med. Imaging, vol. 39, no. 12, pp. 4092–4102, 2020. https://doi.org/10.3389/fbioe.2020.00867.

K. Yi, J. Wu, C. Gan, et al., "Neural-symbolic VQA: Disentangling reasoning from vision and language understanding," in Advances in Neural Information Processing Systems (NeurIPS), 2018. [Online]. Available: https://doi.org/10.48550/arXiv.1810.02338.

N. Rodis, C. Sardianos, G. Papadopoulos, P. Radoglou-Grammatikis, P. Sarigiannidis, and I. Varlamis, "Multimodal explainable artificial intelligence: A comprehensive review of methodological advances and future research directions," ArXiv, abs/2306.05731, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.05731.

P. Lu, S. Mishra, T. Xia, L. Qiu, K. Chang, S. Zhu, O. Tafjord, P. Clark, and A. Kalyan, "Learn to explain: Multimodal reasoning via thought chains for science question answering," ArXiv, abs/2209.09513, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.09513.

P. Lu, L. Qiu, K. Chang, Y. Wu, S. Zhu, T. Rajpurohit, P. Clark, and A. Kalyan, "Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning," ArXiv, abs/2209.14610, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.14610.

P. Lu, L. Qiu, W. Yu, S. Welleck, and K. Chang, "A survey of deep learning for mathematical reasoning," ArXiv, abs/2212.10535, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2212.10535.

P. Lu, B. Peng, H. Cheng, M. Galley, K. Chang, Y. Wu, S. Zhu, and J. Gao, "Chameleon: Plug-and-play compositional reasoning with large language models," ArXiv, abs/2304.09842, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.09842.

J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. Chi, F. Xia, Q. Le, and D. Zhou, "Chain of thought prompting elicits reasoning in large language models," ArXiv, abs/2201.11903, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2201.11903.

P. Lu, H. Bansal, T. Xia, J. Liu, C. Li, H. Hajishirzi, H. Cheng, K. Chang, M. Galley, and J. Gao, "MathVista: Evaluating mathematical reasoning of foundation models in visual contexts," 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.02255.

L. Sun, Y. Han, Z. Zhao, Z. Shen, B. Chen, L. Chen, and K. Yu, "SciEval: A multi-level large language model evaluation benchmark for scientific research," ArXiv, abs/2308.13149, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.13149.

T. Kojima, S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large language models are zero-shot reasoners," ArXiv, abs/2205.11916, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2205.11916.

E. Zelikman, Y. Wu, and N. Goodman, "STaR: Bootstrapping reasoning with reasoning," 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2203.14465.

A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, Y. Wu, B. Neyshabur, G. Gur-Ari, and V. Misra, "Solving quantitative reasoning problems with language models," ArXiv, abs/2206.14858, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.14858.

D. Jain, A. Rahate, G. Joshi, R. Walambe, and K. Kotecha, "Employing Co-Learning to Evaluate the Explainability of Multimodal Sentiment Analysis," IEEE Trans. Comput. Soc. Syst., vol. 11, pp. 4673-4680, 2024. [Online]. Available: https://doi.org/10.1109/TCSS.2022.3176403.

F. Zhao, C. Zhang, and B. Geng, "Deep Multimodal Data Fusion," ACM Comput. Surv., vol. 56, pp. 1-36, 2024. [Online]. Available: https://doi.org/10.1145/3649447.

Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, "Multimodal Chain-of-Thought Reasoning in Language Models," Trans. Mach. Learn. Res., 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2302.00923.

W. Shin, S. Lee, and H. Sue, "Fostering Tech Innovation," Tehnički glasnik, 2024. [Online]. Available: https://doi.org/10.31803/tg-20231212081808.

L. Gaikwad, V. Sunnapwar, and S. Teli, "Adaption of TRIZ method for problem-solving: a case study," Int. J. Six Sigma Competitive Adv., vol. 10, p. 146, 2016. [Online]. Available: https://doi.org/10.1504/IJSSCA.2016.10001733.

I. Belski, "TRIZ thinking heuristics to nurture future generations of creative engineers," Australas. J. Eng. Educ., vol. 24, pp. 86–97, 2019. [Online]. Available: https://doi.org/10.1080/22054952.2019.1699493.

O. Dunets, C. Wolff, A. Sachenko, G. Hladiy and I. Dobrotvor, "Multi-agent system of IT project planning," 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Bucharest, Romania, 2017, pp. 548-552, https://doi.org/10.1109/IDAACS.2017.8095141

International Journal of Computing

Reasoning Mechanism in Multimodal AI Models based on the TRIZ Principles

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information