Image Pair Comparison for Near-duplicates Detection
Keywords:image, near-duplicates, non-near-duplicate, descriptors, similarity, accuracy, threshold, binary classification
The paper describes the search for a solution to the image near-duplicate detection problem. We assume that there are only two images to compare and classify whether they are near-duplicates. There are some traditional methods to match pair of images, and the evaluation of the most famous of them in terms of the problem is performed in this research. The effective thresholds to separate near-duplicate classes are found during experimental modeling using the INRIA Holidays dataset. The sequence of methods is proposed to make the joint decision better in terms of accuracy. It is shown also that the accuracy of binary classification of the proposed approach for the combination of the histogram comparison and ORB descriptors matching is about 85% for both near-duplicate and not near-duplicate pairs of images. This is compared to the existing methods, and it is shown, that the accuracy of more powerful methods, based on deep learning, is better, but the speed of the proposed method is higher.
L. Morra, F. Lamberti, “Benchmarking unsupervised near-duplicate image detection,” Expert Systems with Applications, vol. 135, pp. 313-326, 2019. https://doi.org/10.1016/j.eswa.2019.05.002
A. Jaimes, S. Chang and A. Loui, “Detection of non-identical duplicate consumer photographs,” Proceedings of the Fourth International Conference on Information, Communications and Signal Processing, Singapore, December 15-18, 2003, vol. 1, pp. 16-20. https://doi.org/10.1109/ICICS.2003.1292404
J. Chum, J. Philbin and A. Zisserman, “Near duplicate image detection: min-Hash and tf-idf weighting,” Proceedings of the British Machine Vision Conference, Leeds, UK, September 1-4, 2008, pp. 1-10. https://doi.org/10.5244/C.22.50
A. Jinda-Apiraksa, V. Vonikakis and S. Winkler, “California-ND: An annotated dataset for near-duplicate detection in personal photo collections,” Proceedings of the 5th Intenational Workshop on Quality of Multimedia Experience, Klagenfurt, Austria, July 3-5, 2013. https://doi.org/10.1109/QoMEX.2013.6603227
Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, № 4, pp. 600-612, 2004. https://doi.org/10.1109/TIP.2003.819861
OpenCV Compare Images, 2022, [Online]. Available at: https://www.delftstack.com/howto/python/opencv-compare-images/
LSH for near-duplicate image detection, 2021, [Online]. Available at: https://github.com/mendesk/image-ndd-lsh
Fingerprinting Images for Near-Duplicate Detection, 2020, [Online]. Available at: https://realpython.com/fingerprinting-images-for-near-duplicate-detection/
V. Gorokhovatsky, D. Pupchenko, K. Solodchenko, “Analysis of properties, characteristics and results of the use of advanced detectors to determine the specific points of the image,” Control, Navigation and Communication Systems, vol. 1, issue 47, pp. 93–98, 2018. https://doi.org/10.26906/SUNZ.2018.1.093
E. Rublee, V. Rabaud, K. Konolige and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” Proceedings of the International Conference on Computer Vision, Barcelona, Spain, November 3-16, 2011, pp. 2564-2571. https://doi.org/10.1109/ICCV.2011.6126544
E. Rosten, T. Drummond, “Machine Learning for High-Speed Corner Detection,” Computer Vision – ECCV 2006. ECCV 2006. Lecture Notes in Computer Science, vol. 3951, pp. 430-443, 2006. https://doi.org/10.1007/11744023_34
S. Leutenegger, M. Chli and R. Y. Siegwart, “BRISK: Binary Robust invariant scalable keypoints,” Proceedings of the International Conference on Computer Vision, Barcelona, Spain, November 3-16, 2011, pp. 2548-2555. https://doi.org/10.1109/ICCV.2011.6126542
P. Alcantarilla, J. Nuevo and A. Bartoli, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” Proceedings of the British Machine Vision Conference, Bristol, UK, September 9-13, 2013, pp. 13.1-13.11. https://doi.org/10.5244/C.27.13
H. Jegou, M. Douze, C. Schmid, “Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search,” Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol. 5302, pp. 304-317, 2008. https://doi.org/10.1007/978-3-540-88682-2_24
INRIA Holidays dataset, 2008, [Online]. Available at: http://lear.inrialpes.fr/~jegou/data.php
Scikit-image, 2022, [Online]. Available at: https://scikit-image.org/
cv::BRISK Class Refeence, 2022, [Online]. Available at: https://docs.opencv.org/4.x/de/dbf/classcv_1_1BRISK.html
cv::AKAZE Class Reference, 2022, [Online]. Available at: https://docs.opencv.org/3.4/d8/d30/classcv_1_1AKAZE.html
cv::ORB Class Reference, 2022, [Online]. Available at: https://docs.opencv.org/3.4/db/d95/classcv_1_1ORB.html
Sentence Transformers: Multilingual Sentence, Paragraph, and Image Embeddings using BERT & Co., 2022, [Online]. Available at: https://github.com/UKPLab/sentence-transformers
clip-ViT-B-32, 2021, [Online]. Available at: https://huggingface.co/sentence-transformers/clip-ViT-B-32
Image Similarity in Percentage, 2020, [Online]. Available at: https://github.com/XingLiangLondon/Image-Similarity-in-Percentage
P. Kasnesis, R. Heartfield, X. Liang, L. Toumanidis, G. Sakellari, C. Patrikakis, G. Loukas, “Transformer-based identification of stochastic information cascades in social networks using text and image similarity,” Applied Soft Computing, vol. 108, 2021. https://doi.org/10.1016/j.asoc.2021.107413
J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen and Y. Wu, “Learning fine-grained image similarity with deep ranking,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, June 23-28, 2014, pp. 1386-1393. https://doi.org/10.1109/CVPR.2014.180
Image Similarity using Deep Ranking, 2018, [Online]. Available at: https://github.com/akarshzingade/image-similarity-deep-ranking
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.