Adaptive Document Recognition Technology (ADRT)

Language:
EN
Product-Line:
FineReader Engine
Version:
9.x, 10, 11
Type:
Technology & Features
Category:
Recognition

ADRT is ABBYY’s unique recognition technology that uses a set of innovative document analysis algorithms. Based on the layout and formatting information generated during the OCR process, a logical model of the document structure is generated. This includes:

  • Elements like headers and footers, footnotes, page numbers etc.
  • New in Version 10 technologies:
    Reconstruction of the tables of contents (TOC)

Multi-page documents are processed as a unit, so the documents generated by ADRT have consistent formatting across all pages. The results can be automatically exported to DOC(X).

Adaptive Document Recognition Technology (ADRT) in a nutshell

  • ADRT is part of ABBYY's core OCR technology since Version 9
  • ADRT analyses the layout information of all pages that belong to one document!
    • The assumption in a document conversion scenario is: Pages of a document have a similar layout and structure
    • ADRT uses this basic approach to capture the document as a whole.
  • ADRT analysis results are used in the (new) document synthesis algorithm.
    • “All words based on similar font and style characteristics are combined into certain groups/clusters.
    • One cluster may include words from different pages – for example, words from a document’s header or footer.”
  • ADRT based document export:
    • While synthesising an output file, ADRT matches the most suitable font and style to each cluster (to the whole group of words simultaneously) using all the fonts available on the computer. The word clustering method allows the OCR program to “see” and reconstruct an entire document instead of individual pages one-by-one.
  • ADRT delivers: Unified formatting - Intelligent office documents
  • ADRT works on single and multi-core CPU machines
    Note: it has nothing to do with scaling up OCR processing

ADRT for Developers

In FineReader Engine 10 developers can access the results of the Adaptive Document Recognition Technology and re-use the information in own applications via API of the “DocumentStructure” Object.

  • This object provides access to the logical structure of a document.
  • Document structure is detected during document synthesis and is used for re-creation of the logical structure of a document and formatting attributes during export. The object exposes a set of methods and properties for working with logical sections and styles of the document

Here the overview of the Document Structure Object“


Related Pages:

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.