Farsi document image recognition system using word layout signature


Ergün C., Norozpour S.

Turkish Journal of Electrical Engineering and Computer Sciences, cilt.27, sa.2, ss.1477-1488, 2019 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 27 Sayı: 2
  • Basım Tarihi: 2019
  • Doi Numarası: 10.3906/elk-1804-92
  • Dergi Adı: Turkish Journal of Electrical Engineering and Computer Sciences
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.1477-1488
  • Anahtar Kelimeler: Farsi document image retrieval, word spotting, word layout signature, optical character recognition
  • İstanbul Gelişim Üniversitesi Adresli: Hayır

Özet

In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1% and a precision rate of 94.3% are obtained.