Research on the use of AI for Selecting Abstractions for Natural Language Image Generation Tools

Volodymyr Yakymiv; Yosyf Piskozub

doi:10.47839/ijc.23.4.3763

Authors

Volodymyr Yakymiv
Yosyf Piskozub

DOI:

https://doi.org/10.47839/ijc.23.4.3763

Keywords:

artificial intelligence, computing, AI-generated images, ext-to-image generation

Abstract

The article describes a method of image generation with Artificial Intelligence services using text abstraction retrieved using Artificial Intelligence services Dall-e, MidJourney and Stable Diffusion, that works with natural language. The implementation of the new approach gives a significant gain in image quality and consistency with analysed text. The methodology is based on using neural network API service instead of commonly used natural language algorithms to extract keywords or sentences. Proposed evaluation is applied to the generated images. An analysis of evaluation options is carried out depending on algorithm and Artificial Intelligence service, based on the tested book, length of result abstract and number of errors for each type. The evaluation results show that the new approach can provide better quality images that relate more with the text compared to natural language algorithms. For example, the average score of images generated by abstractions for GPT3 - 7.13 and GPT4 - 7.3, compared to natural language algorithms CO semantic - 5.43, TextRank - 4.98, TF-DF keywords - 4.74, WE spaCy - 3.04, WordNet - 4.34 for MidJourney generated images. Although results show most of the best results were generated for abstract with text length 20-40 words, meantime images generated for abstract with less or more words show much less consistency with text.

References

D. H. Park, S. Azadi, X. Liu, T. Darrell, and A. Rohrbach, “Proceedings of the neural information processing systems track on datasets and benchmarks,” vol. 1, 2021. [Online]. Available: https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/0a09c8844ba8f0936c20bd791130d6b6-Paper-round1.pdf

S. Brade, B. Wang, M. Sousa, S. Oore, and T. Grossman, “Promptify: Text-to-image generation through interactive prompt exploration with large language models.” Association for Computing Machinery, 10 2023, pp. 1–14. [Online]. Available: http://dx.doi.org/10.1145/3586183.3606725

M. Elman, P. Emilio, L. B. Jimmy, and S. Ruslan, “Generating images from captions with attention,” 2016. [Online]. Available: https://doi.org/10.48550/arXiv.1511.02793

S. Xinyue, Q. Yiting, B. Michael, and Z. Yang, “Prompt stealing attacks against text-to-image generation models,” 2024. [Online]. Available: https://arxiv.org/abs/2302.09923

V. Yakymiv, L. Piskozub, and Y. Piskozub, “Using artificial intelligence to generate real-time augmented reality content in book publishing,” International Journal of Control Systems and Robotics, vol. 8, pp. 1–5, 2023. [Online]. Available: https://iaras.org/home/caijcsr/using-artificial-intelligence-to-generate-real-time-augmented-reality-content-in-book

R. Scott, A. Zeynep, Y. Xinchen, L. Lajanugen, S. Bernt, and L. Honglak, “Generative adversarial text to image synthesis,” in International Conference on Machine Learning, 2016. [Online]. Available: https://arxiv.org/abs/1605.05396

X. Zhu, A. Goldberg, M. Eldawy, C. Dyer, and B. Strock, “A text-topicture synthesis system for augmenting communication,” in Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007, pp. 1590–1595.

R. Mihalcea and P. Tarau, “Textrank: Bringing order into text.” In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 07 2004, pp. 404–411. [Online]. Available: https://aclanthology.org/W04-3252/

P. D. Turney, “Learning algorithms for keyphrase extraction,” Information Retrieval, vol. 2, no. 4, pp. 303–336, 2000. [Online]. Available: https://doi.org/10.1023/A:1009976227802

S. Na, M. Do, K. Yu, and J. Kim, “Realistic image generation from text by using bert-based embedding,” Electronics, vol. 11, no. 5, 2022. [Online]. Available: https://www.mdpi.com/2079-9292/11/5/764

S. Mathesul, G. Bhutkar, and A. Rambhad, “Attngan: Realistic text-to-image synthesis with attentional generative adversarial networks,” in Sense, Feel, Design, vol. 13198. Springer International Publishing, 2022, pp. 397–403. [Online]. Available: https://doi.org/10.1007/978-3-030-98388-8_35

Y. Xie, Z. Pan, J. Ma, L. Jie, and Q. Mei, “A prompt log analysis of text-toimage generation systems,” in Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, 2023, p. 3892–3902. [Online]. Available: https://doi.org/10.1145/3543507.3587430

L. Fan, H. J. Wang, K. Zhang, Z. Pei, and A. Li, “Towards an automatic prompt optimization framework for ai image generation,” in HCI International 2023 Posters, vol. 1836. Springer Nature Switzerland, 10 2023, pp. 405–410. [Online]. Available: https://doi.org/10.1007/978-3-031-36004-6_55

V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, vol. 384, no. 23. Association for Computing Machinery, 2022, pp. 1–23. [Online]. Available: https://doi.org/10.1145/3491102.3501825

F. Stanislav, H. Tobias, R. Federico, H. Jörn, and D. Andreas, “Adversarial text-to-image synthesis: A review,” Neural Networks, vol. 144, pp. 187–209, 2021. [Online]. Available: https://doi.org/10.1016/j.neunet.2021.07.019

H. Yaru, C. Zewen, D. Li, and W. Furu, “Optimizing prompts for text-to-image generation,” 12 2022. [Online]. Available: http://dx.doi.org/10.48550/arXiv.2212.09611

X. Xu, J. Guo, Z. Wang, G. Huang, I. Essa, and H. Shi, “Prompt-free diffusion: Taking "text" out of text-to-image diffusion models,” 05 2023. [Online]. Available: http://dx.doi.org/10.48550/arXiv.2305.16223

X. You, “Automatic summarization and keyword extraction from web page or text file,” in 2019 IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET), 2019, pp. 154–158. [Online]. Available: https://doi.org/10.1109/CCET48361.2019.8989315

X. Mao, S. Huang, R. Li, and L. Shen, “Automatic keywords extraction based on co-occurrence and semantic relationships between words,” IEEE Access, vol. 8, pp. 117 528–117 538, 2020. [Online]. Available: https://doi.org/10.1109/ACCESS.2020.3004628

A. Enes, R. C. N. Jason, X. Yang, G. Jie, and L. Shujun, “Improving performance of automatic keyword extraction (ake) methods using postagging and enhanced semantic-awareness,” 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2211.05031

L. Lingfei, W. Yueshan, and Z. Xiaoyi, “A study of keyword extraction for short news documents,” in Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), vol. 12800, International Society for Optics and Photonics. SPIE, 2023, p. 128004H. [Online]. Available: https://doi.org/10.1117/12.3003835

“How many words in one page? answers to your questions.” [Online]. Available: https://www.anycount.com/word-count-news/how-many-words-in-one-page

R. Mihalcea and T. Paul, “Textrank: Bringing order into text,” in Proc. Conf. on Empirical Methods Natural Lang. Process. Association for Computational Linguistics, 07 2004, pp. 404–411. [Online]. Available: https://aclanthology.org/W04-3252/

“Nltk toolkit.” [Online]. Available: https://www.nltk.org/

“Pagerank algorithm.” [Online]. Available: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html

“Cosine distance.” [Online]. Available: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html

International Journal of Computing

Research on the use of AI for Selecting Abstractions for Natural Language Image Generation Tools

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information