Multilevel Document Analysis (MDA)
- Very often recognized documents are somewhat more complex than just a page with black text printed on a white background.
- Modern complex layouts and formatting will often include different elements like tables, pictures, footers and headers, background images and so on. In order to recognize such documents and preserve their complex formatting, all today’s OCR programs first analyse the structure of the document before they start reading it.
- As a rule, several logical levels will be singled out.
At the top level of this hierarchy there is always only one object — the page itself. The other levels in descending order:
- table, text block
- table cell
- paragraph, picture
- word, picture within a line
- letter (character).
- Any object in this hierarchy is composed of lower-level objects, e.g.
- letters make words,
- words make lines, etc.
- Therefore, the program always analyses the document from the top down:
- it first divides the page into larger objects,
- which in turn it divides into smaller objects etc. until it reaches the level of characters.
- Once the characters have been singled out and recognized, the reverse process begins: the program assembles them into larger objects which in the end will make up the entire page. This procedure is called Multilevel Document Analysis, or MDA.
- It is very easy to see that if an OCR program makes a recognition error at one of the higher levels of analysis (e.g. if it mistakes a paragraph for a picture), there is very little chance that it will come up with the right result — the document will be recognized incorrectly.
- ABBYY OCR technology would run the same risk if it functioned like the majority of modern OCR applications, but it analyses documents slightly differently.
- To begin with, when recognizing objects of any level ABBYY is guided by the IPA principles.
- It starts by forming hypotheses about the nature of the objects and purposefully verifies its hypotheses.
- At the same time it takes into account the distinctive features it has detected on the document and saves the newly acquired information for future use (self-learning).