A ROBUST BINARIZATION AND TEXT LINE DETECTION IN HISTORICAL HANDWRITTEN DOCUMENTS ANALYSIS

Authors

  • Jakub Leszek Pach
  • Piotr Bilski

DOI:

https://doi.org/10.47839/ijc.15.3.848

Keywords:

Document analysis, unconstrained handwriting’ Hough transform’, text line detection.

Abstract

In this paper, we present a novel method of detecting text lines in handwritten documents based on the Block-Based Hough Transform. To maximize its efficiency, the robust binarization algorithm was applied. It is based on the Gaussian filtering and tackles the non-uniform luminance. The proposed technique consists of three steps: preprocessing, detecting of potential text lines and eliminating the false ones. The first step covers the image binarization, extraction of connected components and selection of supporting connected components based on the local maxima in the vertical histogram stripes. Secondly, the appropriate subset of connected components supplemented by one-point components is selected. Finally, the block-based Hough transform is applied to detect potential text lines and found the ones identified incorrectly. The proposed method is applied to the analysis of the fifteenth century Latin manuscripts. Our approach is more effective than the traditional ones, in the best cases by twenty percent.

References

G. Louloudis, B. Gatos, I. Pratikakis and K. Halatsis, “A block-based Hough transform mapping for text line detection in handwritten documents,” in Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition, 2006.

E. Bruzzone and M. C. Coffetti, “An algorithm for extracting cursive text lines," in Proceedings of the Fifth International Conference on Document Analysis and Recognition, (ICDAR’99), 1999, pp. 749-752.

M. Arivazhagan, H. Srinivasan and S. Srihari, “A statistical approach to line segmentation in handwritten documents,” Vol. 6500, pp. 65000T, 2007.

L. Likforman-Sulem, A. Hanimyan and C. Faure, “A Hough based algorithm for extracting text lines in handwritten documents,” in Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, Vol. 2, pp. 774-777.

Y. Pu and Z. Shi, “A natural learning algorithm based on Hough transform for text lines extraction in handwritten document,” in Proceedings of the 6th International Workshop on Frontiers in Handwriting Recognition, 1988, pp. 637-646.

J. L. Pach and P. Bilski, “A robust text line detection in complex handwritten documents,” in Proceedings of the 8th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS’2015), Warsaw, Poland, 2015, pp. 271-275.

B. Gatos, N. Stamatopoulos and G. Louloudis, “ICDAR2009 handwriting segmentation contest,” International Journal on Document Analysis and Recognition (IJDAR), Vol. 14, pp. 25-33, 2011.

F. M. Wahl, K. Y. Wong and R. G. Casey, “Block segmentation and text extraction in mixed text/image documents,” Computer Graphics and Image Processing, Vol. 20, pp. 375-390, 1982.

C. Weliwitage, A. Harvey and A. Jennings, “Handwritten document offline text line segmentation,” in Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, (DICTA’05), 2005, pp. 27-27.

H. I. Koo and N. I. Cho, “Text-line extraction in handwritten Chinese documents based on an energy minimization framework,” IEEE Transactions on Image Processing, Vol. 21, pp. 1169-1175, 2012.

Y. Tang, X. Wu and W. Bu, “Text line segmentation based on matched filtering and top-down grouping for handwritten documents,” in Proceedings of the IAPR 11th International Workshop on Document Analysis Systems (DAS), 2014, pp. 365-369.

A. Alaei, P. Nagabhushan and U. Pal, “A new text-line alignment approach based on piece-wise painting algorithm for handwritten documents,” in Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’2011), 2011, pp. 324-328.

H. I. Koo and N. I. Cho, “State estimation in a document image and its application in text block identification and text line extraction,” in Proceedings of the International Conference on Computer Vision (ECCV’2010), Springer, 2010, pp. 421-434.

Z. Shi, S. Setlur and V. Govindaraju, “Text extraction from gray scale historical document images using adaptive local connectivity map,” in Proceedings of the Eighth International Conference on Document Analysis and Recognition, 2005, pp. 794-798.

Z. Shi and V. Govindaraju, “Line separation for complex document images using fuzzy runlength,” in Proceedings of the First International Workshop on Document Image Analysis for Libraries, 2004, pp. 306-312.

Anonymous, “Miscellanea theologica,” 2015.

Downloads

Published

2016-09-30

How to Cite

Pach, J. L., & Bilski, P. (2016). A ROBUST BINARIZATION AND TEXT LINE DETECTION IN HISTORICAL HANDWRITTEN DOCUMENTS ANALYSIS. International Journal of Computing, 15(3), 154-161. https://doi.org/10.47839/ijc.15.3.848

Issue

Section

Articles