Document/Layout Analysis for OCR

Language:
EN
Product-Line:
FineReader Engine, Mobile OCR Engine, Cloud OCR SDK
Version:
7.x, 8.x, 9.x, 10, 11
Type:
Technology & Features
Category:
Recognition

Before the “character” recognition will take place, the logical structure of the document has to be be analyzed and defined. For example:

  • Where are text blocks, paragraphs, lines?
  • Is there a table that should be reconstructed?
  • Are there any “images” on the page(s)?
  • Are there any barcodes to read?

ABBYY technology contains several variants of Document Layout Analysis:

Automatic Document Analysis

The Document Analysis (DA) searches and “finds” zones for recognition on the document images. Here how it works:

  • The Document Analysis algorithms detect different elementary objects on the image, e.g.
    • words or parts of words
    • separators
    • connected components
    • color gradients, inverted text areas
    • …etc.
  • Then, based on this information, hypotheses for these blocks are formed and checked:
    • What is type of the block?
    • Where are the borders of the block?
    • What type of the document layout could it be (magazine, newspaper, book page) ?

The following screenshot of ABBYY FineReader shows the result of a analyzed layout (text, image and table blocks) , as well as the reconstructed output.

or on a multi-column magazine page with intelligent layout analysis & reconstruciton

If there would be no intelligent layout analysis, but use only use one large text block, then the results of are by far not that useable for a human for example on a multi-column document, then the user would also get the text, but not

ABBYY Document/Layout Analysis Modes

Automatic Document Analysis in the SDKs can work in the different modes available in the OCR-SDKs:

  • Full layout analysis – Text, images, tables and barcodes are detected - see samples above.
  • Index mode - tries to find as much text on the image - even if they are embedded in images
  • Mode for Invoices and documents with complex tables
  • Barcode mode - ignores text and images, it only looks for barcodes
  • Lines mode - only returns the text in lines, even in a multi-column document

Note: It is possible to use ABBYY SDK without applying the document layout analysis. Then the developer has to create own blocks/recognition areas. Then this processing scenario is called Field-Level-OCR - Zonal OCR


Back to:

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.