Chinese, Japanese & Korean (CJK) OCR

Language:
EN
Product-Line:
FlexiCapture Engine, FineReader Engine, Mobile OCR Engine, Cloud OCR SDK
Version:
10, 11
Type:
Technology & Features
Category:
Recognition, Languages & OCR

Chinese, Japanese, and Korean languages are often grouped together under the abbreviation “CJK”. They have several features in common, such as use of Chinese characters and of vertical as well as horizontal writing direction.

ABBYY FineReader Engine supports the following predefined recognition languages for CJK texts:

  • ChinesePRC
  • ChineseTaiwan
  • Japanese
  • Korean
  • KoreanHangul

To select one of these predefined languages, you can use the SetPredefinedTextLanguage method of the RecognizerParams object.

ABBYY FineReader Engine supports recognition language combinations consisting of several of these languages or combinations of CJK and other languages. This option is one of the key advantages, especially when the Asian languages have to be combined with European languages, e.g. French-Japanese, or German-Korean, Italian-Chines.

Fonts and PDF Export for CJK

  • To prevent garbling of Asian characters, you must specify for document synthesis a font which includes the necessary set of characters, e.g. Arial Unicode MS, SimSun. You can set the font with the help of the ISynthesisParamsForDocument::FontSet property
  • When you export recognised documents with CJK languages to a PDF (except image-only ;-)) the fonts are embedded automatically. You can export CJK languages to PDF/A in “text under the image” mode (IPDFExportParams::TextExportMode = PEM_ImageOnText) to ensure that the document looks the same.

History of CJK OCR in ABBYY technologies

FineReader Engine

  • Back in the ABBYY technology cycle V 8.0 - CJK was sub licensed from 3rd party.
  • But ABBYY goal to deliver core OCR technologies in-house, because only then it can be guaranteed that it perfectly fits into the existing recognition technology stack.
  • The core development of of CJK started already several years back (2005) and it started with adjusted layout analysis, new algorithms for character separation and a new set of character classifier.
  • In October 2008 ABBYY released its own Asian OCR in FineReader Engine 9.0.
  • Since that year the core technology has been improved continuously
  • Note: CJK OCR is part of the core technology. It is included in the Developer Licenses, but can only be used/rolled out in projects when the language add-on for Asian languages is purchased.

Mobile OCR Engine

  • ABBYY's Mobile OCR Engine is also able to process Asian languages, but because of the CPU and memory restrictions of mobile devices the implementation is not exactly the same as on the FineReader Engine side.

Cloud OCR SDK

Improvements V11

In is being perceived as a very challenging language for OCR technologies, ABBYY further improved its recognition in the Version 11 technology cycle. The article below give a short overview about the history and the improvements made: Processing speed in fast mode has been increased, while maintaining accuracy level.

  • Japanese up to 2.5 times faster
  • Chinese (Simplified) up to 2.5 times faster
  • Chinese (Traditional up to 4.0 times faster
  • Korean up to 2.5 times faster.
  • User dictionaries can be created for Japanese and Korean languages
  • All elements of UI and messages of FineReader Engine 11 are now available in Japanese.

Related Articles

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.