OCR & Processing Speed

Language:
EN
Product-Line:
FlexiCapture Engine, FineReader Engine
Version:
9.x, 10, 11
Type:
Technology & Features, Scenarios/Tasks
Category:
Integration, Recognition, OCR: Speed & Quality

Processing speed is important when high volumes of scanned pages of paper documents have to be processed/converted using Optical Character Recognition. Below some on this topic,

  • Optical Character Recognition is
    • a multi-step process and each of the processing step can be very CPU intense (e.g. image pre-processing or layout analysis)
    • a high number of images and PDFs require fast hard disk throughput
    • processing speed also depends on the the material, for example the document type, image quality, languages
  • OCR quality and processing speed are linked, the general rule is
    • Recognition quality is direct proportional to the required processing time
    • Low quality documents need more CPU time and are processed slower than document images in high quality

General

There is some general “wisdom”

  • the better the image quality, the faster the images can be processed
    • From version 10 on FineReader Engine offers a new fast mode that is especially tuned for good quality images
    • If the image quality is not known before, it is recommend to use the “balanced mode”, here the technology makes the “internal” decision.
  • when the image quality is bad, a lot more (CPU) time has to be “invested” to get the best possible results
  • Complex layouts need more time for document analysis that simple book pages
  • Reading “low quality” characters takes more time than processing “clean” characters.

Time & Throughput

  • The total processing time is a sum of the different internal processing steps
    • different technologies will take different times
      • this is true for different ABBYY technology cycles
      • and of course also when different technologies from different vendors are compared.
    • never the less, to get good recognition results on low quality images a lot more efforts have to be made to get good/usable results
    • if a technology is only tuned for speed, then it will not be able to deliver acceptable results on low quality documents
  • The processing time / throughput of one CPU core is not the final achievable speed of a production installation, because the number of CPU cores is growing with every new CPU generation.
    • To scale up throughput an effective use of multiple cores has to be considered
    • To be able to use multiple CPU cores on one or multiple machines the licensing scheme has to allow this
  • The type of OCR processing also influences the processing time/throughput

Scalability

The good news - OCR Scalability is available built in in ABBYY SDKs and there are several approaches to speed up processing

Licensing

  • The ABBYY Licensing scheme offers “scalability” - so it is possible to scale up processing.
    • A one machine license with allows unlimited CPU cores, when a page limit (renewable or total), a Character limit (renewable or total) or a speed limit are set
    • A network license allows to scale up the processing to a very high number of machines (virtual or physical) and also all CPU cores can be used.

Related Articles


Back to Technology & Features