Research on the use of AI for Selecting Abstractions for Natural Language Image Generation Tools
DOI:
https://doi.org/10.47839/ijc.23.4.3763Keywords:
artificial intelligence, computing, AI-generated images, ext-to-image generationAbstract
The article describes a method of image generation with Artificial Intelligence services using text abstraction retrieved using Artificial Intelligence services Dall-e, MidJourney and Stable Diffusion, that works with natural language. The implementation of the new approach gives a significant gain in image quality and consistency with analysed text. The methodology is based on using neural network API service instead of commonly used natural language algorithms to extract keywords or sentences. Proposed evaluation is applied to the generated images. An analysis of evaluation options is carried out depending on algorithm and Artificial Intelligence service, based on the tested book, length of result abstract and number of errors for each type. The evaluation results show that the new approach can provide better quality images that relate more with the text compared to natural language algorithms. For example, the average score of images generated by abstractions for GPT3 - 7.13 and GPT4 - 7.3, compared to natural language algorithms CO semantic - 5.43, TextRank - 4.98, TF-DF keywords - 4.74, WE spaCy - 3.04, WordNet - 4.34 for MidJourney generated images. Although results show most of the best results were generated for abstract with text length 20-40 words, meantime images generated for abstract with less or more words show much less consistency with text.
References
D. H. Park, S. Azadi, X. Liu, T. Darrell, and A. Rohrbach, “Proceedings of the neural information processing systems track on datasets and benchmarks,” vol. 1, 2021. [Online]. Available: https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/0a09c8844ba8f0936c20bd791130d6b6-Paper-round1.pdf
S. Brade, B. Wang, M. Sousa, S. Oore, and T. Grossman, “Promptify: Text-to-image generation through interactive prompt exploration with large language models.” Association for Computing Machinery, 10 2023, pp. 1–14. [Online]. Available: http://dx.doi.org/10.1145/3586183.3606725
M. Elman, P. Emilio, L. B. Jimmy, and S. Ruslan, “Generating images from captions with attention,” 2016. [Online]. Available: https://doi.org/10.48550/arXiv.1511.02793
S. Xinyue, Q. Yiting, B. Michael, and Z. Yang, “Prompt stealing attacks against text-to-image generation models,” 2024. [Online]. Available: https://arxiv.org/abs/2302.09923
V. Yakymiv, L. Piskozub, and Y. Piskozub, “Using artificial intelligence to generate real-time augmented reality content in book publishing,” International Journal of Control Systems and Robotics, vol. 8, pp. 1–5, 2023. [Online]. Available: https://iaras.org/home/caijcsr/using-artificial-intelligence-to-generate-real-time-augmented-reality-content-in-book
R. Scott, A. Zeynep, Y. Xinchen, L. Lajanugen, S. Bernt, and L. Honglak, “Generative adversarial text to image synthesis,” in International Conference on Machine Learning, 2016. [Online]. Available: https://arxiv.org/abs/1605.05396
X. Zhu, A. Goldberg, M. Eldawy, C. Dyer, and B. Strock, “A text-topicture synthesis system for augmenting communication,” in Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007, pp. 1590–1595.
R. Mihalcea and P. Tarau, “Textrank: Bringing order into text.” In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 07 2004, pp. 404–411. [Online]. Available: https://aclanthology.org/W04-3252/
P. D. Turney, “Learning algorithms for keyphrase extraction,” Information Retrieval, vol. 2, no. 4, pp. 303–336, 2000. [Online]. Available: https://doi.org/10.1023/A:1009976227802
S. Na, M. Do, K. Yu, and J. Kim, “Realistic image generation from text by using bert-based embedding,” Electronics, vol. 11, no. 5, 2022. [Online]. Available: https://www.mdpi.com/2079-9292/11/5/764
S. Mathesul, G. Bhutkar, and A. Rambhad, “Attngan: Realistic text-to-image synthesis with attentional generative adversarial networks,” in Sense, Feel, Design, vol. 13198. Springer International Publishing, 2022, pp. 397–403. [Online]. Available: https://doi.org/10.1007/978-3-030-98388-8_35
Y. Xie, Z. Pan, J. Ma, L. Jie, and Q. Mei, “A prompt log analysis of text-toimage generation systems,” in Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, 2023, p. 3892–3902. [Online]. Available: https://doi.org/10.1145/3543507.3587430
L. Fan, H. J. Wang, K. Zhang, Z. Pei, and A. Li, “Towards an automatic prompt optimization framework for ai image generation,” in HCI International 2023 Posters, vol. 1836. Springer Nature Switzerland, 10 2023, pp. 405–410. [Online]. Available: https://doi.org/10.1007/978-3-031-36004-6_55
V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, vol. 384, no. 23. Association for Computing Machinery, 2022, pp. 1–23. [Online]. Available: https://doi.org/10.1145/3491102.3501825
F. Stanislav, H. Tobias, R. Federico, H. Jörn, and D. Andreas, “Adversarial text-to-image synthesis: A review,” Neural Networks, vol. 144, pp. 187–209, 2021. [Online]. Available: https://doi.org/10.1016/j.neunet.2021.07.019
H. Yaru, C. Zewen, D. Li, and W. Furu, “Optimizing prompts for text-to-image generation,” 12 2022. [Online]. Available: http://dx.doi.org/10.48550/arXiv.2212.09611
X. Xu, J. Guo, Z. Wang, G. Huang, I. Essa, and H. Shi, “Prompt-free diffusion: Taking "text" out of text-to-image diffusion models,” 05 2023. [Online]. Available: http://dx.doi.org/10.48550/arXiv.2305.16223
X. You, “Automatic summarization and keyword extraction from web page or text file,” in 2019 IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET), 2019, pp. 154–158. [Online]. Available: https://doi.org/10.1109/CCET48361.2019.8989315
X. Mao, S. Huang, R. Li, and L. Shen, “Automatic keywords extraction based on co-occurrence and semantic relationships between words,” IEEE Access, vol. 8, pp. 117 528–117 538, 2020. [Online]. Available: https://doi.org/10.1109/ACCESS.2020.3004628
A. Enes, R. C. N. Jason, X. Yang, G. Jie, and L. Shujun, “Improving performance of automatic keyword extraction (ake) methods using postagging and enhanced semantic-awareness,” 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2211.05031
L. Lingfei, W. Yueshan, and Z. Xiaoyi, “A study of keyword extraction for short news documents,” in Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), vol. 12800, International Society for Optics and Photonics. SPIE, 2023, p. 128004H. [Online]. Available: https://doi.org/10.1117/12.3003835
“How many words in one page? answers to your questions.” [Online]. Available: https://www.anycount.com/word-count-news/how-many-words-in-one-page
R. Mihalcea and T. Paul, “Textrank: Bringing order into text,” in Proc. Conf. on Empirical Methods Natural Lang. Process. Association for Computational Linguistics, 07 2004, pp. 404–411. [Online]. Available: https://aclanthology.org/W04-3252/
“Nltk toolkit.” [Online]. Available: https://www.nltk.org/
“Pagerank algorithm.” [Online]. Available: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html
“Cosine distance.” [Online]. Available: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.