IIT Madras researcher develops easy OCR system for Bharati Script
A team of researcher at IIT-Madras headed by Professor V. Srinivasa Chakravarthy, developed a method for reading documents in Bharati Script by using a multi-lingual optical character recognition (OCR) scheme.
It is a unified script for nine Indian languages which is being proposed as a common script for India. The integrated script includes Devnagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil. Urdu and English were not integrated so far because of their very different phonetic organisation.
Why was it needed?
Many European languages (English, French, German, Italian etc.) use Roman script as a common script, which eases communication across all nations that speak and write those languages. Similarly in a diverse nation like ours a common script for entire country can be hoped to bring down many communication barriers existing in India.
Optical Character Recognition (OCR) Schemes
First it involves separating/segmenting document into text and non-text. The text is then further divided into paragraphs, sentences words and letters. Each letter is recognized as character in some recognisable format like ASCII or Unicode. Each letter has different components like basic consonant, consonant modifiers, vowels etc.
In Bharati Script characters these different components are separable by design, therefore, OCR works quite accurately, giving almost 100% accuracy even with mild noise added.
Other Undergoing Projects at IIT Madras
In collaboration with TCS Mumbai, they created a universal finger-spelling language for nine Indian language and using this finger-spelling technique persons with hearing disability can generate signatures or a sign language. Other plans include developing new Braille system with Bharati script.