Document/Layout Analysis for OCR

FineReader Engine, Mobile OCR Engine, Cloud OCR SDK
7.x, 8.x, 9.x, 10, 11
Technology & Features

Before the “character” recognition will take place, the logical structure of the document has to be be analyzed and defined. For example:

  • Where are text blocks, paragraphs, lines?
  • Is there a table that should be reconstructed?
  • Are there any “images” on the page(s)?
  • Are there any barcodes to read?

ABBYY technology contains several variants of Document Layout Analysis:

Automatic Document Analysis

The Document Analysis (DA) searches and “finds” zones for recognition on the document images. Here how it works:

  • The Document Analysis algorithms detect different elementary objects on the image, e.g.
    • words or parts of words
    • separators
    • connected components
    • color gradients, inverted text areas
    • …etc.
  • Then, based on this information, hypotheses for these blocks are formed and checked:
    • What is type of the block?
    • Where are the borders of the block?
    • What type of the document layout could it be (magazine, newspaper, book page) ?

The following screenshot of ABBYY FineReader shows the result of a analyzed layout (text, image and table blocks) , as well as the reconstructed output.

or on a multi-column magazine page with intelligent layout analysis & reconstruciton

If there would be no intelligent layout analysis, but use only use one large text block, then the results of are by far not that useable for a human for example on a multi-column document, then the user would also get the text, but not

ABBYY Document/Layout Analysis Modes

Automatic Document Analysis in the SDKs can work in the different modes available in the OCR-SDKs:

  • Full layout analysis – Text, images, tables and barcodes are detected - see samples above.
  • Index mode - tries to find as much text on the image - even if they are embedded in images
  • Mode for Invoices and documents with complex tables
  • Barcode mode - ignores text and images, it only looks for barcodes
  • Lines mode - only returns the text in lines, even in a multi-column document

Note: It is possible to use ABBYY SDK without applying the document layout analysis. Then the developer has to create own blocks/recognition areas. Then this processing scenario is called Field-Level-OCR - Zonal OCR

