Image Pair Comparison for Near-duplicates Detection


  • Oleksii Gorokhovatskyi
  • Olena Peredrii



image, near-duplicates, non-near-duplicate, descriptors, similarity, accuracy, threshold, binary classification


The paper describes the search for a solution to the image near-duplicate detection problem. We assume that there are only two images to compare and classify whether they are near-duplicates. There are some traditional methods to match pair of images, and the evaluation of the most famous of them in terms of the problem is performed in this research. The effective thresholds to separate near-duplicate classes are found during experimental modeling using the INRIA Holidays dataset. The sequence of methods is proposed to make the joint decision better in terms of accuracy. It is shown also that the accuracy of binary classification of the proposed approach for the combination of the histogram comparison and ORB descriptors matching is about 85% for both near-duplicate and not near-duplicate pairs of images. This is compared to the existing methods, and it is shown, that the accuracy of more powerful methods, based on deep learning, is better, but the speed of the proposed method is higher.


L. Morra, F. Lamberti, “Benchmarking unsupervised near-duplicate image detection,” Expert Systems with Applications, vol. 135, pp. 313-326, 2019.

A. Jaimes, S. Chang and A. Loui, “Detection of non-identical duplicate consumer photographs,” Proceedings of the Fourth International Conference on Information, Communications and Signal Processing, Singapore, December 15-18, 2003, vol. 1, pp. 16-20.

J. Chum, J. Philbin and A. Zisserman, “Near duplicate image detection: min-Hash and tf-idf weighting,” Proceedings of the British Machine Vision Conference, Leeds, UK, September 1-4, 2008, pp. 1-10.

A. Jinda-Apiraksa, V. Vonikakis and S. Winkler, “California-ND: An annotated dataset for near-duplicate detection in personal photo collections,” Proceedings of the 5th Intenational Workshop on Quality of Multimedia Experience, Klagenfurt, Austria, July 3-5, 2013.

Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, № 4, pp. 600-612, 2004.

OpenCV Compare Images, 2022, [Online]. Available at:

LSH for near-duplicate image detection, 2021, [Online]. Available at:

Fingerprinting Images for Near-Duplicate Detection, 2020, [Online]. Available at:

V. Gorokhovatsky, D. Pupchenko, K. Solodchenko, “Analysis of properties, characteristics and results of the use of advanced detectors to determine the specific points of the image,” Control, Navigation and Communication Systems, vol. 1, issue 47, pp. 93–98, 2018.

E. Rublee, V. Rabaud, K. Konolige and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” Proceedings of the International Conference on Computer Vision, Barcelona, Spain, November 3-16, 2011, pp. 2564-2571.

E. Rosten, T. Drummond, “Machine Learning for High-Speed Corner Detection,” Computer Vision – ECCV 2006. ECCV 2006. Lecture Notes in Computer Science, vol. 3951, pp. 430-443, 2006.

S. Leutenegger, M. Chli and R. Y. Siegwart, “BRISK: Binary Robust invariant scalable keypoints,” Proceedings of the International Conference on Computer Vision, Barcelona, Spain, November 3-16, 2011, pp. 2548-2555.

P. Alcantarilla, J. Nuevo and A. Bartoli, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” Proceedings of the British Machine Vision Conference, Bristol, UK, September 9-13, 2013, pp. 13.1-13.11.

H. Jegou, M. Douze, C. Schmid, “Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search,” Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol. 5302, pp. 304-317, 2008.

INRIA Holidays dataset, 2008, [Online]. Available at:

Scikit-image, 2022, [Online]. Available at:

cv::BRISK Class Refeence, 2022, [Online]. Available at:

cv::AKAZE Class Reference, 2022, [Online]. Available at:

cv::ORB Class Reference, 2022, [Online]. Available at:

Sentence Transformers: Multilingual Sentence, Paragraph, and Image Embeddings using BERT & Co., 2022, [Online]. Available at:

clip-ViT-B-32, 2021, [Online]. Available at:

Image Similarity in Percentage, 2020, [Online]. Available at:

P. Kasnesis, R. Heartfield, X. Liang, L. Toumanidis, G. Sakellari, C. Patrikakis, G. Loukas, “Transformer-based identification of stochastic information cascades in social networks using text and image similarity,” Applied Soft Computing, vol. 108, 2021.

J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen and Y. Wu, “Learning fine-grained image similarity with deep ranking,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, June 23-28, 2014, pp. 1386-1393.

Image Similarity using Deep Ranking, 2018, [Online]. Available at:




How to Cite

Gorokhovatskyi , O., & Peredrii, O. (2023). Image Pair Comparison for Near-duplicates Detection. International Journal of Computing, 22(1), 51-57.