FineReader Engine, Cloud OCR SDK
10, 11
Technology & Features
  • ABBYY FineReader Engine offers also native XML Export of document pages.
  • The XML Export allows different options, here just a sample for the character information:
    • XCA_None
      No character attributes are to be written in files in XML format.
    • XCA_Ascii
      Character coordinates and character confidence are to be written in files in XML format.
    • XCA_Basic
      Character coordinates are to be written in files in XML format.
    • XCA_Extended
      Character coordinates, character confidence and extended character attributes are to be written in files in XML format. The following extended attributes are written:
      • whether the word was found in the dictionary,
      • whether the word was recognized with a standard or user-defined language,
      • whether the word is a number,
      • whether the word is an identifier,
      • probability that a character is written with a Serif font,
      • penalty for discordance of characters in a word,
      • the mean width of stroke in the RLE representation of a word image.

ABBYY XML Tag Scheme

In FineReader Engine 11 the XML structure was extended has now also the ability to save information of paragraph styles and roles in XML file.

Simple ABBYY XML Sample

The following image


…was processed with ABBYY Recognition Server, using the different XML export settings:

  • The sample is really simple, but it is enough to show the principle structure of the native ABBYY XML Export
  • You can download a ZIP with the original tiff-file and the 5 different XML results here

XML Simple

XML Character Attributes

Extended XML Character Attributes

Extended ABBYY XML Sample

The ZIP archive (1,3 MB) contains the processing results and the source image.


ZIP content

Processed with Recognition Server 3.0 Release 8 Stand: July 2011

Back to: Technology & FeaturesALTO XML Export

Further Information

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.