Evaluating the Quality of Class Diagrams Generated by GPT-4 Model

Keletso J. Letsholo

doi:10.47839/ijc.24.3.4189

Authors

Keletso J. Letsholo

DOI:

https://doi.org/10.47839/ijc.24.3.4189

Keywords:

Generative Pre-trained Transformer (GPT), Requirements Engineering (RE), Natural Language Processing (NLP), Class Diagram

Abstract

Automatically generating accurate and comprehensive class diagrams from natural language requirements can minimize human errors, improve accuracy, and streamline requirements analysis. OpenAI’s GPT-4 model has made significant strides in this domain. For GPT-4 to gain traction within requirements engineering, the quality of its class diagrams is essential. This study evaluates GPT-4’s class diagrams by comparing them to those created by experts and existing tools, using precision, recall, and F1 measures, which reveal significant variability. GPT-4’s precision ranges from 0.61 to 0.88, reflecting a varied ability to correctly identify instances. Recall spans from 0.63 to 1.00, indicating differences in capturing all relevant instances. The F1 score, which balances precision and recall, ranges from 0.65 to 0.87, indicating a variety of effectiveness in different contexts. In particular, GPT-4 outperforms existing tools in precision, recall, and F1 score, showcasing its strong aptitude to generate accurate class diagrams from natural language. This paper evaluates the GPT-4 diagrams against expert benchmarks, compares them with four tools, and presents insights into GPT-4’s capacity in requirements engineering.

References

B. Bruegge and A. H. Dutoit, Object-Oriented Software Engineering Using UML, Patterns and Java, 2nd ed. USA: Prentice Hall, 2004.

C. Larman, Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development, 3rd ed. USA: Prentice Hall PTR, 2005.

M. Fowler, UML Distilled: a brief guide to the standard object modeling language, 3rd ed. USA: Addison-Wesley Professional, 2004.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in North American Chapter of the Association for Computational Linguistics. Minneapolis, Minnesota: Association for Computational Linguistics, 2019, pp. 4171–4186. [Online]. Available: https://doi.org/10.18653/v1/N19-1423

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” OpenAI, San Francisco, CA, USA, Tech. Rep., 2018. [Online]. Available: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf

I. Ozkaya, “Application of large language models to software engineering tasks: Opportunities, risks, and implications,” IEEE Software, vol. 40, no. 3, pp. 4–8, 2023. [Online]. Available: https://doi.org/10.1109/MS.2023.3248401

OpenAI, “Gpt-4 technical report,” arXiv.org, San Francisco, CA, USA, Tech. Rep., 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.08774

D. De Bari, G. Garaccione, R. Coppola, M. Torchiano, and L. Ardito, “Evaluating large language models in exercises of uml class diagram modeling,” ser. ESEM ’24. New York, NY, USA: Association for Computing Machinery, 2024, p. 393–399. [Online]. Available: https://doi.org/10.1145/3674805.3690741

J. Cámara, J. Troya, L. Burgueño, and A. Vallecillo, “On the assessment of generative ai in modeling tasks: an experience report with chatgpt and uml,” Software and Systems Modeling, vol. 22, no. 3, pp. 781–793, 2023. [Online]. Available: https://doi.org/10.1007/s10270-023-01105-5

N. Marques, R. R. Silva, and J. Bernardino, “Using chatgpt in software requirements engineering: A comprehensive review,” Future Internet, vol. 16, no. 6, p. 180, 2024. [Online]. Available: https://doi.org/10.3390/fi16060180

S. Speth, N. Meißner, and S. Becker, “Chatgpt’s aptitude in utilizing uml diagrams for software engineering exercise generation,” in 2024 36th International Conference on Software Engineering Education and Training (CSEE&T). Wurzburg, Germany: IEEE, 2024, pp. 1–5. [Online]. Available: https://doi.org/10.1109/CSEET62301.2024.10663027

K. Ronanki, C. Berger, and J. Horkoff, “Investigating chatgpt’s potential to assist in requirements elicitation processes,” in 2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). Durres, Albania: IEEE, 2023, pp. 354–361. [Online]. Available: https://doi.org/10.1109/SEAA60479.2023.00061

R. Khojah, M. Mohamad, P. Leitner, and F. G. de Oliveira Neto, “Beyond code generation: An observational study of chatgpt usage in software engineering practice,” Proc. ACM Softw. Eng., vol. 1, no. FSE, jul 2024. [Online]. Available: https://doi.org/10.1145/3660788

A. Mastropaolo, L. Pascarella, E. Guglielmi, M. Ciniselli, S. Scalabrino, R. Oliveto, and G. Bavota, “On the robustness of code generation techniques: An empirical study on github copilot,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). Melbourne, Australia: IEEE, 2023, pp. 2149–2160. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00181

S. Barke, M. B. James, and N. Polikarpova, “Grounded copilot: How programmers interact with code-generating models,” Proc. ACM Program. Lang., vol. 7, no. OOPSLA1, apr 2023. [Online]. Available: https://doi.org/10.1145/3586030

N. Nguyen and S. Nadi, “An empirical evaluation of github copilot’s code suggestions,” in Proceedings of the 19th International Conference on Mining Software Repositories, ser. MSR ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 1–5. [Online]. Available: https://doi.org/10.1145/3524842.3528470

C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). Melbourne, Australia: IEEE, 2023, pp. 919–931. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00085

N. Nascimento, P. Alencar, and D. Cowan, “Artificial intelligence vs. software engineers: An empirical study on performance and efficiency using chatgpt,” in Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, ser. CASCON ’23. USA: IBM Corp., 2023, p. 24–33. [Online]. Available: https://doi.org/10.5555/3615924.3615927

M. A. Umar and K. Lano, “Advances in automated support for requirements engineering: a systematic literature review,” Requirements Engineering, vol. 29, no. 2, pp. 177––207, 2024. [Online]. Available: https://doi.org/10.1007/s00766-023-00411-0

L. Zhao, W. Alhoshan, A. Ferrari, K. J. Letsholo, M. A. Ajagbe, E.-V. Chioasca, and R. T. Batista-Navarro, “Natural language processing for requirements engineering: A systematic mapping study,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1–41, 2021. [Online]. Available: https://doi.org/10.1145/3444689

T. Yue, L. Briand, and Y. Labiche, “A systematic review of transformation approaches between user requirements and analysis models,” Requirements Engineering, vol. 16, no. 2, pp. 75–99, 2011. [Online]. Available: https://doi.org/10.1007/s00766-010-0111-y

D. Falessi, N. Juristo, C. Wohlin, B. Turhan, J. Münch, A. Jedlitschka, and M. Oivo, “Empirical software engineering experts on the use of students and professionals in experiments,” Empirical Software Engineering, vol. 23, pp. 452–489, 2018. [Online]. Available: https://doi.org/10.1007/s10664-017-9523-3

H. Harmain and R. Gaizauskas, “Cm-builder: A natural languagebased case tool for object-oriented analysis,” Automated Software Engineering, vol. 10, no. 2, pp. 157–181, 2003. [Online]. Available: https://doi.org/10.1023/A:1022916028950

M. Ibrahim and R. Ahmad, “Class diagram extraction from textual requirements using natural language processing (nlp) techniques,” in Computer Research and Development, 2010 Second International Conference on. Kuala Lumpur, Malaysia: IEEE, 2010, pp. 200–204. [Online]. Available: https://doi.org/10.1109/ICCRD.2010.71

S. K. Shinde, V. Bhojane, and P. Mahajan, “Nlp based object oriented analysis and design from requirement specification,” International Journal of Computer Applications, vol. 47, no. 21, pp. 30–34, June 2012. [Online]. Available: https://doi.org/10.5120/7475-0574

K. Lunn, Software Development with UML, 1st ed. London, United Kingdom: Palgrave Macmillan, 2002.

P. Harmon and M. Watson, Understanding UML: The Developer’s Guide: with a Web-based Application in Java. Massachusetts, United States: Morgan Kaufmann Publishers Inc., 1997.

V. Ambriola and V. Gervasi, “On the systematic analysis of natural language requirements with circe,” Automated Software Engineering, vol. 13, no. 1, pp. 107–167, 2006. [Online]. Available: https://doi.org/10.1007/s10515-006-5468-2

V. B. R. Vidya Sagar and S. Abirami, “Conceptual modeling of natural language functional requirements,” Journal of Systems and Software, vol. 88, pp. 25–41, 2014. [Online]. Available: https://doi.org/10.1016/j.jss.2013.08.036

M. Elbendak, P. Vickers, and N. Rossiter, “Parsed use case descriptions as a basis for object-oriented class model generation,” Journal of Systems and Software, vol. 84, no. 7, pp. 1209–1223, 2011. [Online]. Available: https://doi.org/10.1016/j.jss.2011.02.025

J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,” Biometrics, vol. 33, no. 1, pp. 159–174, 1977. [Online]. Available: https://doi.org/10.2307/2529310

International Journal of Computing

Evaluating the Quality of Class Diagrams Generated by GPT-4 Model

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information