What image resolution is the best one?

  • For regular texts (font size 8-10 points) it is recommended to use 300 dpi resolution for OCR. The reason is that all ABBYY technologies are tuned for that resolution.
  • If scans have a smaller resolution, for example 200 dpi, then 10 point font will be too small. To compensate the “missing” pixels, the image will be scaled internally (up to 400 dpi). Low image quality (i.e. resolution) may lead to not only quality but also speed degradation as uncertainty in character picture produces more recognition variants to process.
  • For smaller font text sizes (8 points or smaller) we recommend to use A 400-600 dpi resolution.

In addition one should consider the following (and current) recommendations and limitations of OCR technology for a character size in pixels:

  • 1-byte (simple script) languages like English, Russian
    • Recommended: 20
    • Minimal: 12
  • 2-byte (complex script) languages like Japanese, Chinese
    • Recommended: 25
    • Minimal: 22
    • Maximum: 60 (for body text, for heading there is no limit)
  • 1-byte (complex script) languages like Thai, Hebrew, Arabic
    • Recommended: 20
    • Minimal: 12

Page layout analysis requires small character size in a text string bigger than 1 millimeter.

  • ABBYY Technologies use colour information for detecting areas and objects on the image.
  • So, if complex layouts have to be processed, it is recommend to use colour or at least, grey scale images.
  • The character recognition is always executed on an bi-tonal image, that only contains black & white. But to archive a good OCR result it is important to generate a good, suitable binary image. ABBYY technology is not just using “simple” binarization, but “adaptive binarization.

See also:

What is the largest supported font size for OCR?

  • The largest font ABBYY OCR can handle is 5cm or 140pt.

What is the largest image size in pixels?

  • Currently ABBYY products can open images formats up to 32512*32512 pixels.
  • Bigger images have to be cut and the segments have to be processed separately

What is the maximum image file size?

  • ABBYY Technology v.8.x and older were able to open image files up to 2 GB.
  • Starting from V9.0 this limit doesn't exist any more. :-)

Character-Shapes with different Image Resolutions

  • The lower the image resolution has a strong influence of the way how a single character is “built up”.

  • The image resolution has a real impact on the OCR quality that can be archived - below a sample of small text of a fax, the problems you can see here
    • Text is not on a straight line
    • Characters are “squeezed” and glued together
    • The resolution is by far too low for the classifiers
    • → you (as a human) might might be able to read it, when you know the language and the context.

  • But when zooming in further, on the pixel level, you probably will fail - so do the algorithms :-(

