Batch OCR Processing Sample FineReader Engine 10/11

Language:
EN
Product-Line:
FineReader Engine
Version:
10, 11
Platform:
Windows
KB-Type:
Code Samples Collection
Category:
Recognition, OCR: Speed & Quality
Image:
image: icon_batchprocessingrecognition.gif

This sample shows the gain in speed when processing a batch of documents in several asynchronous processes as compared to processing of the same documents one-by-one in a single process. The sample uses the BatchProcessor object for processing. This means that:

  • Image files are taken from a custom image source, i.e. you can implement image processing queue in a custom way.
  • Image files are taken one-by-one from the image source and immediately passed for processing to available recognition processes. When a recognition process completes recognition of an image, it receives the next image from the source.
  • Recognized pages are returned to the user in the order they have been taken from the image source.

Compare this sample with the Multi-Processing Recognition sample.

Screenshot made with a Laptop (2012) Quad i7-3720QM, 2,6 GHz, Windows 7, 16 GB RAM, 64 bit;
FineReader Engine 10 R7 - Std. sample files that are included

Screenshot made with a Laptop (2009) Core2 Duo T9800, 2,9 GHz, Windows 7, 4 GB RAM 32 bit;
FineReader Engine 10 R1 - Std. sample files that are included
Production machines are faster and have more cores =)

Description

The sample processes a batch of images from the specified folder and, if necessary, saves them in PDF format. The sample compares the speed of batch processing in one thread and in parallel threads. The result of comparison is shown on a diagram.

To view how it works:

  1. Select the folder with images to process.
  2. Specify recognition languages of the processing documents.
  3. Specify the number of CPU cores to test on. The default number of CPU cores equals the maximum available number. You can reduce it to compare the difference in speed. The test for one CPU core is performed automatically during processing.
  4. Specify, whether to save recognized text. If the results are saved, the duration of all processing operations (analysis, recognition, synthesis, and export) is measured. Otherwise, only the time of analysis and recognition is measured.
  5. To run the sample, click Start.

The sample uses the following procedure of multiprocessing recognition:

  • Create the Engine object using the GetEngineObject function.
  • Implement the IImageSource and IFileAdapter interfaces, which provide access to the image source and files in it.
  • Call the CreateBatchProcessor method of the Engine object to receive the BatchProcessor object.
  • Call the Start method to initialize the processor, invoke asynchronous recognition processes, specify the source of images and processing settings.
  • Call the GetNextProcessedPage method in a loop until the method returns 0, which means that there are no more images in the source and all the processed images have been returned to the user.
  • The page returned by the GetNextProcessedPage method exists until the next call of this method. Therefore, if you want to save this page, you must save it before the next call of the GetNextProcessedPage method:
    • Call the Synthesize method of the FRPage object.
    • Call the Export method of the FRPage object to save the page into a file of the specified format.
  • Unload FineReader Engine — use the DeinitializeEngine function.


Back To:

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.