Languages & OCR

Intro

Languages & OCR - In a nutshell

  • ABBYY OCR technology supports over 200 languages.
    • Every OCR language within the ABBYY products comes at least with a predefined internal definition of allowed/forbidden characters.
    • ABBYY also delivers morphology dictionaries for a lot of languages
    • The SDKs also
      • come with a dictionary API so that own word list or dictionaries can be included in the recognition process
      • allow to define custom languages (based on existing ones)
    • A lot of languages have their own alphabet, like Latin, Cyrillic, Hebrew, etc. By selecting the recognition language(s) the different “experts” for these different character types are activated.
  • It is very important that the correct, possible languages are defined when documents are recognized with OCR. Why?
    • Situation: A document that has text in German, French or another language that has special characters should be processed.
    • If only with “English” is set as the recognition language, then this settings will restrict and forbid the OCR Engine to read certain characters like “Ä”, “ö”, etc.
  • FineReader Engine 11 for example supports
    • 6 alphabets: Latin, Greek, Cyrillic, Armenian, Hebrew, Thai, New V11: Arabic
    • 47 languages with morphology, dictionary support & spell check
    • Chinese (traditional and simplified), Japanese, Korean (CJK)
    • Vietnamese, Thai, Hebrew, Arabic
    • 5 languages in FineReader XIX Gothic and other 17th till 20th century fonts
    • 6 programming languages (Basic, C/C++, COBOL, Java, etc.)
    • 4 artificial languages (Esperanto, Interlingua, etc.)
    • Simple chemical formulas

Further Articles

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.
  • No tags, yet