MASC: A Dataset for the Development and Classification of Mobile Applications Screens
Keywords:
Mobile applications, Activities Classification, UI Screens Classification, MASC Dataset, Wireframes, Machine Learning AlgorithmsAbstract
Mobile applications (apps) have become an integral part of our daily lives, offering a wide range of functionalities and services. Understanding the diversity of mobile app screens is crucial for optimizing user experience and delivering personalized content. This paper presents a novel dataset, called MASC (Mobile App Screens Classification) consisting of 7065 images, representing various types of mobile apps screens. MASC dataset is collected from the well-known Rico dataset. These screens were carefully manually classified into ten unique classes to capture the diverse nature of app interfaces. Based on the MASC dataset, this paper presents a proposed framework for applying machine learning (ML) algorithms to the classification of mobile apps screens. The paper presents a feature extraction algorithm that extracts, from each screenshot image of an app screen, key characteristics related to visual elements, text, and keywords. Using the proposed framework, the paper also presents a comprehensive study of the classification of mobile apps screens using ML algorithms. Several classification algorithms including XGBoost, Gradient Boosting, Random Forest, SVM, Logistic Regression, and others were trained and evaluated on MASC. Results showed high accuracy scores, above 93%, for top models like Gradient Boosting, indicating that ML algorithms with the MASC dataset provide an effective approach to mobile app screen classification. This study contributes to the field of mobile app analysis and user interface understanding. In addition, the proposed mobile app screens classification framework is a promising development that can enhance the accuracy and efficiency of mobile app screens classification. The complete code is available on GitHub to ensure reproducibility and foster further research: https://github.com/Ali-Aahmed/MASC-Dataset.
References
Z. Wang, et al., "A deep learning method for android application classification using semantic features," Security and Communication Networks, vol. 2022, Article ID 1289175, pp. 1-16, 2022. https://doi.org/10.1155/2022/1289175.
K. Alharbi, T. Yeh, "Collect, decompile, extract, stats, and diff: Mining design pattern changes in Android apps," Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services, 2015, pp. 515-524. https://doi.org/10.1145/2785830.2785892.
R. Kuber, A. Hastings, and M. Tretter, "Determining the accessibility of mobile screen readers for blind users," UMBC Faculty Collection, 2020.
A. Rodrigues, et al., "Open challenges of blind people using smartphones," International Journal of Human–Computer Interaction, vol. 36, issue 17, pp. 1605-1622, 2020. https://doi.org/10.1080/10447318.2020.1768672.
R. Kumar, et al, "Webzeitgeist: design mining the web," Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2013, pp. 3083–3092. https://doi.org/10.1145/2470654.2466420.
F. Behrang, S.P. Reiss, and A. Orso, "GUIfetch: supporting app design and development through GUI search," Proceedings of the 5th International Conference on Mobile Software Engineering and Systems, 2018, pp. 236-246. https://doi.org/10.1145/3197231.3197244.
G. Berardi, et al., "Multi-store metadata-based supervised mobile app classification," Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015, pp. 585-588. https://doi.org/10.1145/2695664.2695997.
H. Zhu, et al., "Mobile app classification with enriched contextual information," IEEE Transactions on Mobile Computing, vol. 13, issue 7, pp. 1550-1563, 2013. https://doi.org/10.1109/TMC.2013.113.
E. Platzer, and O. Petrovic, "Learning mobile app design from user review analysis," International Journal of Interactive Mobile Technologies (IJIM), vol. 5, issue 3, pp. 43-50, 2011. https://doi.org/10.3991/ijim.v5i3.1673.
B. Deka, et al., "Rico: A mobile app dataset for building data-driven design applications," Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, UIST'2017, Canada, 2017, pp. 845-854. https://doi.org/10.1145/3126594.3126651.
T. F. Liu, et al., "Learning design semantics for mobile apps," Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, Berlin, 2018, pp. 569-579. https://doi.org/10.1145/3242587.3242650.
A. Rosenfeld, O. Kardashov, and O. Zang, "Automation of Android applications testing using machine learning activities classification," arXiv preprint arXiv:1709.00928, 2017. https://doi.org/10.1145/3197231.3197241.
B. Wang, et al., "Screen2words: Automatic mobile UI summarization with multimodal learning," Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology, 2021, pp. 1-13. https://doi.org/10.1145/3472749.3474765.
H. Wen, et al., "AutoDroid: LLM-powered task automation in Android," ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024, 543-557. https://doi.org/10.1145/3636534.3649379.
L. Zhang, et al., LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation, Proceedings of the UIST’24, October 13–16, 2024, Pittsburgh, PA, USA, pp. 1-13. https://doi.org/10.1145/3654777.3676382.
A. Shirazi, et al., "Insights into layout patterns of mobile user interfaces by an automatic analysis of Android apps," Proceedings of the 5th ACM SIGCHI Symposium on Engineering Interactive Computing Systems EICS'13, 2013, pp. 275-284. https://doi.org/10.1145/2494603.2480308.
B. Deka, Z. Huang, and R. Kumar, "ERICA: Interaction mining mobile apps," Proceedings of the 29th Annual Symposium on User Interface Software and Technology, 2016, pp. 767-776. https://doi.org/10.1145/2984511.2984581.
L. Leiva, A. Hota, and A. Oulasvirta, "Enrico: A dataset for topic modeling of mobile UI designs," Proceedings of the 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services MobileHCI'20, 2020, pp. 1-4. https://doi.org/10.1145/3406324.3410710.
A. Lavanya, et al., "Assessing the performance of Python data visualization libraries: A review," International Journal of Computer Engineering in Research Trends (IJCERT), vol. 10, no. 1, pp. 28–39, 2023. https://doi.org/10.22362/ijcert/2023/v10/i01/v10i0104.
T. Chen, and C. Guestrin, "XGBoost: A scalable tree boosting system," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785.
J. Friedman, "Greedy function approximation: A gradient boosting machine," The Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001. https://doi.org/10.1214/aos/1013203451.
L. Breiman, "Random Forests," Machine Learning, vol. 45, issue 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324.
M.-C. Popescu, et al., "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, issue 7, pp. 579-588, 2009.
Y. Freund, R. E. Schapire, "A desicion-theoretic generalization of on-line learning and an application to boosting," In: Vitányi, P. (eds) Computational Learning Theory. EuroCOLT 1995. Lecture Notes in Computer Science, vol 904, 1995. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59119-2_166.
J. Peng, K. Lee, and G. Ingersoll, "An introduction to logistic regression analysis and reporting," Journal of Educational Research, vol. 96, no. 1, p. 3-14, 2002. https://doi.org/10.1080/00220670209598786.
L. Rokach, O. Maimon, Decision Trees, In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook, 2005, pp. 165-192, Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_9.
I. Rish, "An empirical study of the Naive Bayes classifier," Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, 4 August 2001, pp. 41-46.
C. Cortes, and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, issue 3, pp. 273-297, 1995. https://doi.org/10.1023/A:1022627411411.
D. Valero-Carreras, J. Alcaraz, and M. Landete, "Comparing two SVM models through different metrics based on the confusion matrix," Computers & Operations Research, vol. 152, pp. 106131, 2023. https://doi.org/10.1016/j.cor.2022.106131.
J. Li, H. Sun, and J. Li, "Beyond confusion matrix: Learning from multiple annotators with awareness of instance features," Machine Learning, vol. 112, issue 3, pp. 1053-1075, 2023. https://doi.org/10.1007/s10994-022-06211-x.
S. Narkhede, "Understanding AUC-ROC curve," Towards Data Science, vol. 26, issue 1, pp. 220-227, 2018.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.