Table of Contents
OCR & Processing Speed
Processing speed is important when high volumes of scanned pages of paper documents have to be processed/converted using Optical Character Recognition. Below some on this topic,
- Optical Character Recognition is
- a multi-step process and each of the processing step can be very CPU intense (e.g. image pre-processing or layout analysis)
- a high number of images and PDFs require fast hard disk throughput
- processing speed also depends on the the material, for example the document type, image quality, languages
- OCR quality and processing speed are linked, the general rule is
- Recognition quality is direct proportional to the required processing time
- Low quality documents need more CPU time and are processed slower than document images in high quality
There is some general “wisdom”
- the better the image quality, the faster the images can be processed
- From version 10 on FineReader Engine offers a new fast mode that is especially tuned for good quality images
- If the image quality is not known before, it is recommend to use the “balanced mode”, here the technology makes the “internal” decision.
- when the image quality is bad, a lot more (CPU) time has to be “invested” to get the best possible results
- cleaning images
- rotating images
- FineReader Engine Windows contains a (pre-compiled) code sample that makes it easy to the influence of image preprocessing on the over all processing speed
- Complex layouts need more time for document analysis that simple book pages
- Reading “low quality” characters takes more time than processing “clean” characters.
Time & Throughput
- The total processing time is a sum of the different internal processing steps
- different technologies will take different times
- this is true for different ABBYY technology cycles
- and of course also when different technologies from different vendors are compared.
- never the less, to get good recognition results on low quality images a lot more efforts have to be made to get good/usable results
- if a technology is only tuned for speed, then it will not be able to deliver acceptable results on low quality documents
- The processing time / throughput of one CPU core is not the final achievable speed of a production installation, because the number of CPU cores is growing with every new CPU generation.
- To scale up throughput an effective use of multiple cores has to be considered
- To be able to use multiple CPU cores on one or multiple machines the licensing scheme has to allow this
- The type of OCR processing also influences the processing time/throughput
- Pure text extraction without document layout retention is faster than exporting to a format where the layout has to be reconstructed. Further details on
- there is also a precompiled code sample for simplified testing
- Image-resolution changes and MRC compression of the generated PDFs will influence the processing speed
- PDF Export Profile Sample can be used for custom tests
The good news - OCR Scalability is available built in in ABBYY SDKs and there are several approaches to speed up processing
- The latest versions of FineReader Engine and FlexiCapture Engine support Multi Core CPUs
Details can be found on the code sample articles:
- The ABBYY Licensing scheme offers “scalability” - so it is possible to scale up processing.
- A one machine license with allows unlimited CPU cores, when a page limit (renewable or total), a Character limit (renewable or total) or a speed limit are set
- A network license allows to scale up the processing to a very high number of machines (virtual or physical) and also all CPU cores can be used.
- OCR Recognition Modes - Quality or speed or balanced recognition mode
- Image Processing and Binarisation for Camera OCR - image pre-processing and recognition speed
- OCR SDK Scalability - boosted processing with multiple CPU cores
- Network Licensing - use multiple machines to scale up
- Code Sample articles on Multi-Core Support and Processing Profiles:
- Predefined OCR Processing Profiles Sample (FRE) - Quality/Speed Profiles for different processing scenarios
Back to Technology & Features