FineReader Engine 12 - What is new in R2 Overview

_ This page describes the features that became available on the public announcement date in May 2018. In the meantime, a broad range of new features were introduced. _

ABBYY FineReader Engine 12 includes the latest version of ABBYY OCR technologies, improved classification, fits the modern software deployment trends such as Cloud, and offers a number of additional functionality improvements.

Improved classification

Advanced classification algorithms in FineReader Engine 12 leverage modern technologies such as machine learning and natural language processing technologies and offers improved document classification quality together with more flexible and fine-tuning options. The customer can choose between new intelligent Image and advanced Text Classifiers or use a combination of them:

  • Image Classifier - collects and processes visual information about document images and delivers fast classification results.
  • Text Classifier - extracts and processes information about the documents’ content, which increases the classification accuracy

FineReader Engine 12 also offers new classification modes (for the Text Classifier) which help to optimize the classification for high precision, high recall or a balance between these:

  • High precision mode - recommended in scenarios, where it is important to precisely classify documents into the right categories and limit wrong class assignment to a minimum.
  • High recall mode - recommended in scenarios, in which it is important to detect all documents belonging into a certain category among all available documents, and limit the risk that they might be missed.

The new version also features significantly improved and reworked classification API: new improved algorithms and providing built-in cross-validation techniques makes the work with the improved classification API more convenient and easy.

New deployment methods

  • Cloud-ready licensing
    This new type of license allows software vendors to integrate FineReader Engine in applications deployed within the Cloud environment (e.g. services as Amazon EC2 and Microsoft Azure). In addition, this license type allows using the SDK on virtual machines and with docker containers.

New & improved OCR languages

  • IMPROVED: Japanese OCR accuracy
    A new OCR language was added: Modern Japanese. Due to alphabet corrections and dictionary enhancements, this OCR language offers higher recognition accuracy of Japanese and supports recognition of Japanese documents that contain English words and Greek characters α, β, θ, π. We recommend using the new 'Modern Japanese' OCR language instead of the originally available 'Japanese' OCR.
  • Farsi OCR - official support
    Available as a technical preview in the past, the Farsi OCR is supported in FineReader Engine 12 as an official OCR language. By adding the dictionary support and improving the recognition algorithms, the OCR accuracy was significantly increased.
  • Burmese OCR - technical preview
    This new OCR language is added as a technical preview. The Burmese OCR is suitable for documents in the Myanmar3 font, 10pt-12pt size.
  • Japanese and Arabic documents: Enhanced recognition of dates, times, addresses and names when using new 'special predefined languages'
    For a better field recognition and data capture from Japanese and Arabic documents, the Japanese and Arabic predefined languages were extended by special algorithms. A special character set and regular expressions were added as new special predefined languages for the recognition of dates. These enhancements help to improve the recognition accuracy for such fields.
  • Improved recognition of right-to-left written languages
    Two new properties help to detect the leftmost and the rightmost character in the word and to correctly designate the first and the last letters of the word during recognition and parsing. This significantly increases recognition accuracy for right-to-left written languages such as Arabic or Hebrew. In the past, the leftmost character was defined as the first letter of the word, which would negatively influence the recognition accuracy.

Improved layout reconstruction

  • Tables and layout reconstruction
    Improved table detection and analysis together with enhanced layout reconstruction allows reconstructing tables exactly like in the original document. This simplifies work with the final document that can be easily reused.
  • Detection and recreation of balanced text columns
    Balanced columns - text columns with the same length - are detected and precisely recreated during the export step.
  • Recreation of dashed separators in tables during export to DOCX
    Table borders that are represented as dashed separators will be detected and reconstructed the same way in the output document - not as simple grid lines.
  • Recreation of cell border color during export to XLSX
    When processing documents with tables, FineReader Engine 12 detects the line color of cell borders, preserves the information and recreates the line color during the export step to XLSX format.
  • Improved layout retention on TXT export
    New export mode which simulates the original layout by inserting spaces and has the following features:
    • emulation of the paragraph indentation and central alignment with spaces.
    • emulation of spaces between paragraphs with empty lines.
    • special processing of frames and footnotes.
    • translation of characters in upper and lower cases into special Unicode characters.

New export formats

  • PDF 2.0 - the latest PDF standard includes the following updates:
    • encryption - the producer may embed the encrypted PDF document within an unencrypted PDF document;
    • support of the new types of the digital signatures - based on CAdES standard, LTV and certificates based on elliptic curves;
    • new types of annotations: projections, 3D and rich media;
    • accessibility - pronunciation hints.
  • PDF/UA - export to PDF in accordance with PDF/UA standard is available now. The legislation of the most countries requires state and federal authorities to provide accessible versions of their websites and PDF documents. Conformance with PDF/UA ensures accessibility for people with disabilities who use assistive technology such as screen readers, screen magnifiers, joysticks and other technologies to navigate and read electronic content.
  • HTML 5 - New export format - as an alternative to PDF for client-server mobile cross-platform applications. The developers can now grant their users an ability to use FineReader Engine 12 without client-program installation, thus increase the conversion, avoid the security limitations and the necessity to have the single environment to run the program on user’s PC.
  • ALTO 3.1 - New export format - The latest 3.1 ALTO XML scheme is used for storage of metadata especially for complex documents and includes the following main changes:
    • Added support for using different shapes for the elements String, TextLine, all PageSpaceType elements and on all BlockType elements.
    • The description of the attribute ROTATION is changed to the rotation of the contents of a block and not the block itself. The attribute is inherited by all sub-elements.

New PDF/A saving options

  • PDF/A-2b and PDF/A-3b - new options for PDF/A-2 and for PDF/A-3 are supported for the level B conformance:
    • PDF/A-2 enables the creation of smaller PDF files using JPEG2000 compression. For long-term archiving, this can help reduce used storage space and enable faster access when working on low bandwidth networks.
    • PDF/A-3 allows inclusion of PDF/A files or files in a variety of other binary formats such as XML or Office formats. Long-term archiving and readability of the PDF/A part are still guaranteed, and the binary attachments can deliver additional benefits making this format attractive in new areas, for example, when a graphical representation of a document should be combined with some source data.

New XML saving options and improvements

  • Faster export to XML, when saving information about character coordinates prior to image deskewing
    FineReader Engine 12 allows exporting documents to XML with the same speed - regardless of whether information about character coordinates should be saved before or after deskewing. (In cases when information was required about character coordinates on the image prior deskewing, the process would slow down when exporting to XML in FineReader Engine 11.)
  • Direct export of list elements to XML
    FineReader Engine is now able to export the information about detected list elements to XML and reconstruct the original lists in XML documents - without the need to acquire the information about list elements via own programming. This significantly simplifies work with the export results. (Previous versions did not allow exporting information about list elements to XML directly. To recreate individual list elements in an output XML, the information had to be acquired by iterating the recognized text and programmatically extracting information about the list elements, followed by generating own XML file.)
  • Export an information about tab-space characters to XML
    New version of FineReader Engine allows to get and use the information about the tab-space characters (including the number of dashes/lines/spaces) in order to better reconstruct the original documents. (The number of dashes/lines/spaces for the tab-space characters was not saved by previous versions, which causes difficulties during the original document reconstruction using the output XML file.)

Other features and improvements

  • New scanning utility and options
    The new scanning utility can be used by customers who want to use pre-processing tools of their scanner hardware instead of FineReader Engine tools. The new scanning utility provides access to 4 new scanning options allowing to conduct a set of automated tasks during the scanning step:
    • ICAP_AUTODISCARDBLANKPAGES - automated deletion of blank pages
    • ICAP_AUTOMATICBORDERDETECTION - automated page crop
    • ICAP_AUTOMATICDESKEW - automated skew correction
    • ICAP_AUTOMATICCOLORENABLED - automated detection color information
  • Customer Project-ID
    The new entity Customer Project ID decreases the numbers of errors during Engine object initialization. For more info: Licensing - Customer Project ID.
  • Official support of Azure App Services
    A new tested and officially supported Cloud environment.
  • New HTML-based Help file
    A new HTML-based help offers simplified navigation and improved search abilities that makes information search easier and faster.
  • 256-bit AES encryption with the possibility to use Unicode characters during PDF encryption
    256-bit AES encryption level uses UTF-8 character encoding for PDF files encryption, which makes customers independent from the language of their operating system and allows encrypting PDF files with passwords in any language. (In the past, 128-bit AES encryption level was used, which caused problems when PDF files were encoded with passwords in a specific language (e.g. Turkish): depending on the language of the operational system, different character encoding was used. As a result, even correctly entered passwords would not open password-protected PDF-files.)
  • Access to information about individual recognition variants
    FineReader Engine 12 allows to access information about words and individual characters as well as their coordinates - for each recognition hypothesis individually. Accessing information for all possible recognition variants allows developers implementing own tools that can iterate the offered recognition variants and select the most probable one - based on their own internal rules. (In the previous version, this information was only available for the variant that was selected as the most probable recognition variant by ABBYY FineReader Engine. The information about the less probable recognition variants was not available.)
This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.
  • No tags, yet