OCR & Processing Speed

Language:
EN
Product-Line:
FlexiCapture Engine, FineReader Engine
Version:
9.x, 10, 11
Type:
Technology & Features, Scenarios/Tasks
Category:
Integration, Recognition, OCR: Speed & Quality

Processing speed is important when high volumes of scanned pages of paper documents have to be processed/converted using Optical Character Recognition. Below some on this topic,

  • Optical Character Recognition is
    • a multi-step process and each of the processing step can be very CPU intense (e.g. image pre-processing or layout analysis)
    • a high number of images and PDFs require fast hard disk throughput
    • processing speed also depends on the the material, for example the document type, image quality, languages
  • OCR quality and processing speed are linked, the general rule is
    • Recognition quality is direct proportional to the required processing time
    • Low quality documents need more CPU time and are processed slower than document images in high quality

General

There is some general “wisdom”

  • the better the image quality, the faster the images can be processed
    • From version 10 on FineReader Engine offers a new fast mode that is especially tuned for good quality images
    • If the image quality is not known before, it is recommend to use the “balanced mode”, here the technology makes the “internal” decision.
  • when the image quality is bad, a lot more (CPU) time has to be “invested” to get the best possible results
  • Complex layouts need more time for document analysis that simple book pages
  • Reading “low quality” characters takes more time than processing “clean” characters.

Time & Throughput

  • The total processing time is a sum of the different internal processing steps
    • different technologies will take different times
      • this is true for different ABBYY technology cycles
      • and of course also when different technologies from different vendors are compared.
    • never the less, to get good recognition results on low quality images a lot more efforts have to be made to get good/usable results
    • if a technology is only tuned for speed, then it will not be able to deliver acceptable results on low quality documents
  • The processing time / throughput of one CPU core is not the final achievable speed of a production installation, because the number of CPU cores is growing with every new CPU generation.
    • To scale up throughput an effective use of multiple cores has to be considered
    • To be able to use multiple CPU cores on one or multiple machines the licensing scheme has to allow this
  • The type of OCR processing also influences the processing time/throughput

Scalability

The good news - OCR Scalability is available built in in ABBYY SDKs and there are several approaches to speed up processing

Licensing

  • The ABBYY Licensing scheme offers “scalability” - so it is possible to scale up processing.
    • A one machine license with allows unlimited CPU cores, when a page limit (renewable or total), a Character limit (renewable or total) or a speed limit are set
    • A network license allows to scale up the processing to a very high number of machines (virtual or physical) and also all CPU cores can be used.

Related Articles


Back to Technology & Features

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.