Home  | Committees  | Program  | Submission  | Registration



6th International Workshop on Multilingual OCR

A satellite workshop to be held in conjunction with

ICDAR 2017

16th International Conference on Document Analysis and Recognition
Kyoto, Japan
Saturday, November 11, 2017

1100 - 1120 Deep Convolutional Recurrent Network for Segmentation-free Offline
Handwritten Japanese Text Recognition
Nam Tuan Ly, Cuong Tuan Nguyen, Cong Kha Nguyen and Masaki Nakagawa
1120 - 1140 DeepKHATT: A Deep Learning Benchmark on Arabic Script
Riaz Ahmad, Saeeda Naz, M. Zeshan Afzal, S. Faisal Rashid,
Marcus Liwicki and Andreas Dengel
1140 - 1200 Detection and Recognition of Arabic Text in Video Frames
Wataru Ohyama, Seiya Iwata, Tetsushi Wakabayashi and Fumitaka Kimura
1200 - 1220 The Impact of Visual Similarities of Arabic-like scripts in Terms of Learning
in an OCR System
Riaz Ahmad, Saeeda Naz, M. Zeeshan Afzal, Shiekh Faisal Rashid,
Markus Liwicki and Andreas Dengel
1230 - 1430 Lunch
Oral Session 2
1430 - 1450 Implicit Language Model in LSTM for OCR
Ekraam Sabir, Stephen Rawls and Prem Natarajan
1450 - 1510 An Empirical Study of Effectiveness of Post-processing in Indic Scripts
V S Vinitha, C V Jawahar and Minesh Mathew
1510 - 1530 Improving Classical OCRs for Brahmic Scripts using Script Grammar Learning
Dipankar Ganguly, Sumeet Agarwal and Santanu Chaudhury
1530 - 1550 Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam
Minesh Mathew, Mohit Jain and C V Jawahar
1550 - 1620 Coffee Break
1620 - 1630 Closing Remarks

The workshop will provide a forum for highlighting current research on multilingual document analysis systems with particular emphasis on OCR. The predecessors to this workshop were held in conjunction with ICDAR 1999 in Bangalore, India, ICDAR 2009 in Barcelona, Spain, ICDAR 2011 in Beijing, China, ICDAR 2013 in Washington DC, USA and ICDAR 2015 in Nancy, France. A joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data was held in conjunction with ICDAR2011 in Beijing, China. The scope of `Multilingual OCR' is defined to include systems that are capable of reading more than one language in the same document, as well as one-language-per- document systems that can be easily retargeted to new languages. The proposed workshop will provide a forum for technical discussions on three important themes: i) recent progress in the field and promising new techniques , ii) attempts to identify and address 'hard' open research problems, and iii) performance evaluation of multilingual OCR systems.

With its emphasis on multi-lingual documents and encouragement of "work in progress" reports, the workshop is intended to be complementary to the main conference. In order to ensure that there is no conflict between submissions to the main conference and to the workshop, the paper submission deadline for the workshop will be after the decisions for the main conference have been made on July 1, 2017. The main motivation for this Workshop is to

  • Encourage "work in progress" manuscripts and preliminary ideas as well as mature work
  • Encourage descriptions of large collaborative projects
  • Encourage papers discussing at least 2 languages/scripts
  • Emphasize public access of the datasets when applicable
  • Encourage groups to demonstrate working systems by providing additional time after each talk
  • Organize group papers by scripts and languages to contrast methods and compare results
  • Ensure full participation of attendees by moderators encouraging plenty of Q/A
  • Hold discussion periods at the end of each session to summarize "take home messages" based on the papers presented and the Q/A

The workshop will also put out a call for multilingual OCR demo systems from researchers allowing them to submit applications for live demonstration of their work. A subset of the program committee will screen the applications for appropriateness to this workshop and approve demos for presentation.

The topics that will be addressed by this Workshop are:

  • Proven Methodologies for OCR: Efficacy of existing methodologies for Latin script to other scripts (HMMs, Neural networks etc.)
  • Mixed languages: Techniques applicable/retargetable to multiple languages/scripts; documents containing multiple languages/scripts,
  • Newer languages/ Scripts: Techniques for dealing with problems of scripts for which OCR technology has not matured
  • Document Analysis: Language and script identification, machine print vs handwriting, layout analysis, reading order
  • Special domains: Scene text and video text, mathematical formulas; tables; abbreviations; annotations
  • Degraded and historical documents
  • Domain knowledge: Colloquialism, dialect, language models
  • Evaluation Methodologies: metrics, standards, ground truth; benchmark datasets
  • Demo systems


Important Dates

» Paper Submission
    August 1, 2017
» Reviews Completed
    August 17, 2017
» Author Notification
    August 22, 2017
» Final Papers Due
    September 1, 2017

General Co-Chairs

Venu Govindaraju
University at Buffalo, SUNY, USA

Prem Natarajan
ISI, University of Southern California, USA

Santanu Chaudhury
CEERI, India

Srirangaraj Setlur
University at Buffalo, SUNY, USA


Center for Unified Biometrics
113 Davis Hall
University at Buffalo
Amherst, NY 14260-2500
Ph: +1-716-645-6164
Fax: +1-716-645-6176

ICDAR 2017 Home | Maintained by CUBS