Document/Layout Analysis for OCR
Before the “character” recognition will take place, the logical structure of the document has to be be analyzed and defined. For example:
- Where are text blocks, paragraphs, lines?
- Is there a table that should be reconstructed?
- Are there any “images” on the page(s)?
- Are there any barcodes to read?
ABBYY technology contains several variants of Document Layout Analysis:
Automatic Document Analysis
The Document Analysis (DA) searches and “finds” zones for recognition on the document images. Here how it works:
- The Document Analysis algorithms detect different elementary objects on the image, e.g.
- words or parts of words
- connected components
- color gradients, inverted text areas
- Then, based on this information, hypotheses for these blocks are formed and checked:
- What is type of the block?
- Where are the borders of the block?
- What type of the document layout could it be (magazine, newspaper, book page) ?
The following screenshot of ABBYY FineReader shows the result of a analyzed layout (text, image and table blocks) , as well as the reconstructed output.
or on a multi-column magazine page with intelligent layout analysis & reconstruciton
If there would be no intelligent layout analysis, but use only use one large text block, then the results of are by far not that useable for a human for example on a multi-column document, then the user would also get the text, but not
ABBYY Document/Layout Analysis Modes
Automatic Document Analysis in the SDKs can work in the different modes available in the OCR-SDKs:
- Full layout analysis – Text, images, tables and barcodes are detected - see samples above.
- Mode for Invoices and documents with complex tables
- Barcode mode - ignores text and images, it only looks for barcodes
- Lines mode - only returns the text in lines, even in a multi-column document
Note: It is possible to use ABBYY SDK without applying the document layout analysis. Then the developer has to create own blocks/recognition areas. Then this processing scenario is called Field-Level-OCR - Zonal OCR