Languages & OCR

Intro

Languages & OCR - In a nutshell

  • ABBYY OCR technology supports over 200 languages.
    • Every OCR language within the ABBYY products comes at least with a predefined internal definition of allowed/forbidden characters.
    • ABBYY also delivers morphology dictionaries for a lot of languages
    • The SDKs also
      • come with a dictionary API so that own word list or dictionaries can be included in the recognition process
      • allow to define custom languages (based on existing ones)
    • A lot of languages have their own alphabet, like Latin, Cyrillic, Hebrew, etc. By selecting the recognition language(s) the different “experts” for these different character types are activated.
  • It is very important that the correct, possible languages are defined when documents are recognized with OCR. Why?
    • Situation: A document that has text in German, French or another language that has special characters should be processed.
    • If only with “English” is set as the recognition language, then this settings will restrict and forbid the OCR Engine to read certain characters like “Ä”, “ö”, etc.
  • FineReader Engine 11 for example supports
    • 6 alphabets: Latin, Greek, Cyrillic, Armenian, Hebrew, Thai, New V11: Arabic
    • 47 languages with morphology, dictionary support & spell check
    • Chinese (traditional and simplified), Japanese, Korean (CJK)
    • Vietnamese, Thai, Hebrew, Arabic
    • 5 languages in FineReader XIX Gothic and other 17th till 20th century fonts
    • 6 programming languages (Basic, C/C++, COBOL, Java, etc.)
    • 4 artificial languages (Esperanto, Interlingua, etc.)
    • Simple chemical formulas

Further Articles

  • No tags, yet