A ROBUST BINARIZATION AND TEXT LINE DETECTION IN HISTORICAL HANDWRITTEN DOCUMENTS ANALYSIS
DOI:
https://doi.org/10.47839/ijc.15.3.848Keywords:
Document analysis, unconstrained handwriting’ Hough transform’, text line detection.Abstract
In this paper, we present a novel method of detecting text lines in handwritten documents based on the Block-Based Hough Transform. To maximize its efficiency, the robust binarization algorithm was applied. It is based on the Gaussian filtering and tackles the non-uniform luminance. The proposed technique consists of three steps: preprocessing, detecting of potential text lines and eliminating the false ones. The first step covers the image binarization, extraction of connected components and selection of supporting connected components based on the local maxima in the vertical histogram stripes. Secondly, the appropriate subset of connected components supplemented by one-point components is selected. Finally, the block-based Hough transform is applied to detect potential text lines and found the ones identified incorrectly. The proposed method is applied to the analysis of the fifteenth century Latin manuscripts. Our approach is more effective than the traditional ones, in the best cases by twenty percent.References
G. Louloudis, B. Gatos, I. Pratikakis and K. Halatsis, “A block-based Hough transform mapping for text line detection in handwritten documents,” in Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition, 2006.
E. Bruzzone and M. C. Coffetti, “An algorithm for extracting cursive text lines," in Proceedings of the Fifth International Conference on Document Analysis and Recognition, (ICDAR’99), 1999, pp. 749-752.
M. Arivazhagan, H. Srinivasan and S. Srihari, “A statistical approach to line segmentation in handwritten documents,” Vol. 6500, pp. 65000T, 2007.
L. Likforman-Sulem, A. Hanimyan and C. Faure, “A Hough based algorithm for extracting text lines in handwritten documents,” in Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, Vol. 2, pp. 774-777.
Y. Pu and Z. Shi, “A natural learning algorithm based on Hough transform for text lines extraction in handwritten document,” in Proceedings of the 6th International Workshop on Frontiers in Handwriting Recognition, 1988, pp. 637-646.
J. L. Pach and P. Bilski, “A robust text line detection in complex handwritten documents,” in Proceedings of the 8th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS’2015), Warsaw, Poland, 2015, pp. 271-275.
B. Gatos, N. Stamatopoulos and G. Louloudis, “ICDAR2009 handwriting segmentation contest,” International Journal on Document Analysis and Recognition (IJDAR), Vol. 14, pp. 25-33, 2011.
F. M. Wahl, K. Y. Wong and R. G. Casey, “Block segmentation and text extraction in mixed text/image documents,” Computer Graphics and Image Processing, Vol. 20, pp. 375-390, 1982.
C. Weliwitage, A. Harvey and A. Jennings, “Handwritten document offline text line segmentation,” in Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, (DICTA’05), 2005, pp. 27-27.
H. I. Koo and N. I. Cho, “Text-line extraction in handwritten Chinese documents based on an energy minimization framework,” IEEE Transactions on Image Processing, Vol. 21, pp. 1169-1175, 2012.
Y. Tang, X. Wu and W. Bu, “Text line segmentation based on matched filtering and top-down grouping for handwritten documents,” in Proceedings of the IAPR 11th International Workshop on Document Analysis Systems (DAS), 2014, pp. 365-369.
A. Alaei, P. Nagabhushan and U. Pal, “A new text-line alignment approach based on piece-wise painting algorithm for handwritten documents,” in Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’2011), 2011, pp. 324-328.
H. I. Koo and N. I. Cho, “State estimation in a document image and its application in text block identification and text line extraction,” in Proceedings of the International Conference on Computer Vision (ECCV’2010), Springer, 2010, pp. 421-434.
Z. Shi, S. Setlur and V. Govindaraju, “Text extraction from gray scale historical document images using adaptive local connectivity map,” in Proceedings of the Eighth International Conference on Document Analysis and Recognition, 2005, pp. 794-798.
Z. Shi and V. Govindaraju, “Line separation for complex document images using fuzzy runlength,” in Proceedings of the First International Workshop on Document Image Analysis for Libraries, 2004, pp. 306-312.
Anonymous, “Miscellanea theologica,” 2015.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.