FineReader Engine 12 - What is new Overview

New in V12 Windows Linux Mac OS

ABBYY FineReader Engine 12 includes the latest ABBYY OCR and classification technologies that leverage artificial intelligence algorithms. The SDK supports modern software deployment trends such as deployment in the Cloud or virtual environments, and offers a number of additional improvements.

In addition, since the Release 3 FineReader Engine for Windows, the SDK supports Office documents as new input formats and offers recognition of Machine-Readable Zones in ID documents.

New: Processing of Office documents

In addition to a broad set of image formats and all types of PDFs, FineReader Engine can now process input documents that are created in one of Office document formats:

  • Text documents: *.doc, *.docx, *.rtf, *.htm / *.html, *.txt, *.odt
  • Spreadsheets: *.xls, *.xlsx, *.ods
  • Presentations: *.ppt, *.pptx, *.odp

Available only in the Windows version

New: Capture of information from Machine Readable Zones (MRZ) in ID documents

The new feature allows recognizing and automatically extracting data from Machine Readable Zones (MRZ) in ID documents and allows faster entering and verification of personal data during customer onboarding or verification processes.

Available only in the Windows version

New: AI-based document classification

Advanced classification algorithms in FineReader Engine 12 leverage artificial intelligence technologies such as Machine Learning and Natural Language Processing to provide precise document classification - with flexible options, possibility to learn on real documents and ability for setting up very granular classification systems. Two classifiers can be trained on available document samples and used later during the classification process (individually or in combination):

  • Image Classifier
    This classifier collects visual information about document images and delivers fast classification results based on image features.
  • Text Classifier
    This classifier extracts information about the documents’ textual content, which increases the classification accuracy. For the Text Classifier, FineReader Engine 12 also offers classification modes that help to optimize the classification for high precision or high recall:
    • High precision mode is recommended in scenarios, where it is important to precisely classify documents into the right categories and limit wrong class assignment to a minimum.
    • High recall mode is recommended in situations, in which it is important to detect all documents belonging into a certain category, and limit the risk that they might be missed.

The new version of FineReader Engine also features significantly improved and reworked classification API: New algorithms and built-in cross-validation techniques makes the work with the classification API convenient and easy and allow to optimize the classification accuracy during the training step.

New: Deployment in the Cloud and virtual environments

  • New Online License for deploment in the Cloud, with Docker containers and in virtual environments
    This new type of license allows software vendors integrating the FineReader Engine in applications running within the Cloud environment (e.g. services as Amazon EC2 and Microsoft Azure). In addition, this license type allows using the SDK on virtual machines and with Docker containers. In the Windows version, the Online License works as well with proxy servers.

New & enhanced OCR languages

  • NEW: Farsi OCR - official support
    Available as a technical preview in the past, Farsi OCR is supported in FineReader Engine 12 as an official OCR language. By adding the dictionary support and improving the recognition algorithms, the OCR accuracy was significantly increased.
  • NEW: Georgian OCR - only in the Windows version
    The new OCR support for Georgian language is available for Sylfaen font which is used in most Georgian documents.

Available only in the Windows version

  • NEW: OCR for simple mathematical formulas - only in the Windows version
    Text in scientific documents containing simple one-line simple mathematical formulas can now be recognized.

Available only in the Windows version

  • Burmese OCR - technical preview
    This new OCR language is added as a technical preview. The Burmese OCR is suitable for documents in the Myanmar3 font, 10pt-12pt size.
  • IMPROVED: Japanese OCR accuracy
    A new OCR language was added: Japanese (Modern). Due to alphabet corrections and dictionary enhancements, this OCR language offers higher recognition accuracy of Japanese and supports recognition of documents that as well contain English words and Greek characters α, β, θ, π. We recommend using the 'Japanese (Modern)' OCR language instead of the originally available 'Japanese' OCR - the 'Japanese' OCR language will not be further optimized in the future.
  • Japanese and Arabic documents: Enhanced recognition of dates, times, addresses and names when using new 'special predefined languages'
    For a better field recognition and data capture from Japanese and Arabic documents, the Japanese and Arabic predefined languages can be extended by special algorithms. A special character set and regular expressions were added as new special predefined languages for the recognition of dates. These enhancements help to improve the recognition accuracy for such fields.
  • Improved recognition of right-to-left written languages
    Two new properties help to detect the leftmost and the rightmost character in the word and to correctly designate the first and the last letters of the word during recognition and parsing. This significantly increases recognition accuracy for right-to-left written languages such as Arabic or Hebrew. In the past, the leftmost character was defined as the first letter of the word, which would negatively influence the recognition accuracy.
  • Improved recognition of Asian OCR languages - with the support of Artificial Intelligence algorithms:
    The newly trained Convolutional Neural Network for recognition of Asian languages provides following improvements:
    • Significantly faster recognition of Korean
    • Faster recognition of Chinese
    • Increased speed & accuracy in recognition of Japanese (Modern)

Available only in the Windows version

Improved: Layout reconstruction

  • Tables and layout reconstruction
    Improved table detection and analysis together with enhanced layout reconstruction allows reconstructing tables exactly like in the original document. This simplifies work with the final document that can be easily reused.
  • Detection and recreation of balanced text columns
    Balanced columns - text columns with the same length - are detected and precisely recreated during the export step.
  • Recreation of dashed separators in tables during export to DOCX
    Table borders that are represented as dashed separators will be detected and reconstructed the same way in the output document - not as simple grid lines.
  • Recreation of cell border color during export to XLSX
    When processing documents with tables, FineReader Engine 12 detects the line color of cell borders, preserves the information and recreates the line color during the export step to XLSX format.
  • Improved layout retention on TXT export
    New export mode which simulates the original layout by inserting spaces and has the following features:
    • emulation of the paragraph indentation and central alignment with spaces.
    • emulation of spaces between paragraphs with empty lines.
    • special processing of frames and footnotes.
    • translation of characters in upper and lower cases into special Unicode characters.

New: Additional PDF & PDF/A saving options

  • PDF 2.0
    The latest PDF standard includes the following updates:
    • Encryption - the producer may embed the encrypted PDF document within an unencrypted PDF document
    • Support of the new types of the digital signatures - based on CAdES standard, LTV and certificates based on elliptic curves
    • New types of annotations: projections, 3D and rich media
    • Accessibility - pronunciation hints
  • PDF/UA export format
    The export to PDF in accordance with PDF/UA standard is available now. The legislation of the most countries requires state and federal authorities to provide accessible versions of their websites and PDF documents. Conformance with PDF/UA ensures accessibility for people with disabilities who use assistive technology such as screen readers, screen magnifiers, joysticks and other technologies to navigate and read electronic content.
  • PDF/A-2b and PDF/A-3b export formats
    The new options for PDF/A-2 and for PDF/A-3 are supported for the level B conformance:
    • PDF/A-2 enables the creation of smaller PDF files using JPEG2000 compression. For long-term archiving, this can help reduce used storage space and enable faster access when working on low bandwidth networks.
    • PDF/A-3 allows inclusion of PDF/A files or files in a variety of other binary formats such as XML or Office formats. Long-term archiving and readability of the PDF/A part are still guaranteed, and the binary attachments can deliver additional benefits making this format attractive in new areas, for example, when a graphical representation of a document should be combined with some source data.
  • Set of additional tags on export to tagged PDF
    The ability to add more tags to the exported tagged PDFs enables creating PDFs that are compliant with the Web Content Accessibility Guidelines in the European Union and the Section 508 Amendment to the Rehabilitation Act of 1973 in the USA

Available only in the Windows version

  • Ability to save information about creation and editing dates during the export to PDF
    This features allows to record the date of creation, the date of modification, or both information during the PDF export step.

Available only in the Windows version

New: Additional export formats

  • HTML5 - New export format
    This format is an alternative to PDF for client-server mobile cross-platform applications. The developers can now grant their users an ability to use FineReader Engine 12 without client-program installation, thus increase the conversion, avoid the security limitations and the necessity to have the single environment to run the program on user’s PC.
  • ALTO 3.1 - New export format
    The latest 3.1 ALTO XML scheme is used for storage of metadata especially for complex documents and includes the following main changes:
    • Added support for using different shapes for the elements String, TextLine, all PageSpaceType elements and on all BlockType elements.
    • The description of the attribute ROTATION is changed to the rotation of the contents of a block and not the block itself. The attribute is inherited by all sub-elements.

Improved: XML saving options

  • Faster export to XML, when saving information about character coordinates prior to image deskewing
    FineReader Engine 12 allows exporting documents to XML with the same speed - regardless of whether information about character coordinates should be saved before or after deskewing. (In cases when information was required about character coordinates on the image prior deskewing, the process would slow down when exporting to XML in FineReader Engine 11.)
  • Direct export of list elements to XML
    FineReader Engine is now able to export the information about detected list elements to XML and reconstruct the original lists in XML documents - without the need to acquire the information about list elements via own programming. This significantly simplifies work with the export results. (Previous versions did not allow exporting information about list elements to XML directly. To recreate individual list elements in an output XML, the information had to be acquired by iterating the recognized text and programmatically extracting information about the list elements, followed by generating own XML file.)
  • Exporting information about tab-space characters to XML
    New version of FineReader Engine allows to get and use the information about the tab-space characters (including the number of dashes/lines/spaces) in order to better reconstruct the original documents. (The number of dashes/lines/spaces for the tab-space characters was not saved by previous versions, which causes difficulties during the original document reconstruction using the output XML file.)

Other features and improvements

  • New scanning utility and options
    The new scanning utility can be used by customers who want to use pre-processing tools of their scanner hardware instead of FineReader Engine tools. The new scanning utility provides access to 4 new scanning options allowing to conduct a set of automated tasks during the scanning step:
    • ICAP_AUTODISCARDBLANKPAGES - automated deletion of blank pages
    • ICAP_AUTOMATICBORDERDETECTION - automated page crop
    • ICAP_AUTOMATICDESKEW - automated skew correction
    • ICAP_AUTOMATICCOLORENABLED - automated detection color information
  • Customer Project-ID
    The new entity Customer Project ID decreases the numbers of errors during Engine object initialization. For more info: Licensing - Customer Project ID.
  • Official support of Azure App Services
    A new tested and officially supported Cloud environment.
  • 256-bit AES encryption with the possibility to use Unicode characters during PDF encryption
    256-bit AES encryption level uses UTF-8 character encoding for PDF files encryption, which makes customers independent from the language of their operating system and allows encrypting PDF files with passwords in any language. (In the past, 128-bit AES encryption level was used, which caused problems when PDF files were encoded with passwords in a specific language (e.g. Turkish): depending on the language of the operational system, different character encoding was used. As a result, even correctly entered passwords would not open password-protected PDF-files.)
  • Access to information about individual recognition variants
    FineReader Engine 12 allows to access information about words and individual characters as well as their coordinates - for each recognition hypothesis individually. Accessing information for all possible recognition variants allows developers implementing own tools that can iterate the offered recognition variants and select the most probable one - based on their own internal rules. (In the previous version, this information was only available for the variant that was selected as the most probable recognition variant by ABBYY FineReader Engine. The information about the less probable recognition variants was not available.)

Enhanced documentation

  • New HTML-based Help file
    Offers simplified navigation and improved search abilities that makes information search easier and faster.
  • New article about deployment of FineReader Engine in Docker containers
    Available in the Windows version
This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.
  • No tags, yet