FineReader Engine 11 for Linux - New Features Overview

Latest Release

  • FineReader Engine 11 Linux - Release 8
    • Release date: 21.03.2017
    • Part #: 1161/28
    • Build #: 11.1.19.854867

Download the latest release:
FineReader Engine 11 for Linux
– Login needed

Free download of the latest distributive is available for ABBYY customers with a valid Software Maintenance Agreement.
Please contact your Account Manager for login information.

New Features and Improvements

[pre-processing]

  • Ability to remove garbage from color images
    The extended image pre-processing increases recognition accuracy of color images. Similarly to the already existing feature for removing small excess dots, which slow down processing, from the black-and-white images, it is now possible to remove garbage from the colour images.

[text layer in PDF]

  • Ability to inject a text layer into selected pages of a PDF document
    This feature improves flexibility. In this release, the possibility to inject text layer under the image is extended: Now it is possible to individually specify the pages, in which the text layer should be injected.
  • Extended method of injecting text into PDF
    During the process of injecting text layer into scanned PDFs, the extended method allows to deskew and correct orientation of scanned PDFs. When processing a batch of PDFs containing both scanned and digital-born PDF documents, all scanned PDF images can be automatically extended by a text layer and turned into searchable PDF files – even documents that were scanned incorrectly.
  • Extension of method for detecting text layer in PDFs
    The method for detecting text layer in PDFs has been extended. In the past the method accepted only a string for the first parameter 'FileName'. Now it is possible to pass a byte array for the 'FileName' as well. The extension of the method is useful in the scenario when working with PDFs from InputStream. PDFs imported from memory stream can now be checked for text layer without a need to write the stream into a temporary file which increases the overall processing speed.

[PDF export]

  • Ability to rasterize FreeText annotations
    When processing PDF documents that contain Text Box annotations and exporting them to PDF, it is now possible to retain all information from annotations in FreeText type in PDF.
  • Export for multi-page PDFs documents with an undefined number of pages
    This feature increases efficiency when scanning large multipage documents. The new export approach introduced in the previous release has been modified in this release: Even if the number of pages of the document sent for processing is not known, the recognition session can still be created. When scanning multipage documents, the number of pages in a document is typically known only after the scanning step is completed. The modified new export API allows sending pages for recognition even if scanning of remaining pages of a multipage document is not yet finished.
  • Ability to adjust a time zone for PDF export
    In previous releases it was possible to write the modification and the creation date using UTC format into the PDF file. Now it is possible to specify a time zone that will be used for the creation and modification date of the exporting documents. Several PDF viewer applications display creation/modification date of the document without using information about the user’s time zone. In some cases this missing information might be very important. These new options will allow to specify the creation and modification date for each PDF file.
  • [Technical preview] Faster PDF printing when using MRC compression
    A new option in the set of MRC correction parameters allows to tune Mixed Raster Content parameters for PDF export. This increases the PDF printing speed. (At the moment, the feature is implemented as a technical preview.)

[XML export]

  • Improved readability of exported XML data for users
    The default value of paragraph style names are now automatically generated according to the paragraphs’ role and modifications, which were applied to the style. This improves the readability of XML-based text and simplifies work for operators or system administrators. To increase flexibility, users can also set a paragraph style name manually.

[TXT export]

  • Ability to exclude BOM during export to TXT
    New export option allows specifying, whether the byte order mark (BOM) should appear at the start of the text stream, when the document is exported to TXT format in UTF-8 encoding. This saves Java developers from programming workarounds for discarding the BOM characters at the beginning of the file.

[licence]

  • Simultaneous usage of network and standalone licences within one installation
    In some cases it is efficient to use different types of licenses - standalone and network - on one computer. To support this scenario, it is possible to define network and standalone licenses in one LicensingSettings.xml file.

[documentation]

  • Updated documentation for working with screenshots
    New recommendations for processing of screenshots were added into the documentation to support developers with useful tipps for this increasingly popular scenario.

More details about individual features and the latest release distributive can be found on the download page.

Previous Releases

Release 6 Update

FineReader Engine 11 Linux - Release 6 Update

  • Release date: 14.03.2016
  • Part #: 1155/21
  • Build #: 11.1.14.707470

This release is a patch to the latest official maintenance release of the product (FineReader Engine 11 R6), that contains speed optimizations due to new heap manager. Fine-tuning this distributive to Linux OS memory management specifics allowed us to overcome previous speed limitations.

The ‘heap’ is very common memory management paradigm in programming. For example, a good definition is provided here: “A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is enormously popular. Almost all non-trivial Java and C programs use new or malloc.”

We substituted the code used by one after Doug Lea. It turned out to be more efficient in case many small memory chunks are allocated. A speed up was gained for document analysis, recognition, and sometimes for synthesis plus PDF export steps. The speed up is not consistent and stable since it highly depends on an image/document complexity, available memory, and recognition route.

Release 6

  • FineReader Engine 11 Linux - Release 6 (available since 25.12.2015)

    • New export approach for large multi-page document conversion to searchable PDFs.
      The new export mode is almost 4 times faster as the default export method. It requires less RAM resources, makes the error handling more convenient and preservers losing all exported data in case of export error.
    • Smaller output PDF file size with pages of mixed colors.
      In combination with existing API for individual compression selection PDF exporting API allows now to reduce the file size of the exported PDF.
    • The predefined BarcodeRecognition_Accuracy profile provides more accurate results.
      The recognition accuracy of some barcodes has increased even by 20%.
    • Ability to process single-page documents from memory in batch mode.
      New API allows processing of single-page documents stored in memory in batch mode. It helps to comply to strict security standards as well as to increase the processing speed.
    • Correct processing of texts with different directionality.
      Updated properties for capturing the location of word’s characters now provide accurate results for both right-­to-­left and left-­to-right writing languages.
    • OCR improvements for Japanese, Arabic, Thai and Farsi.
      The improvements result in overall better recognition accuracy.
    • Possibility to implement own interface for managing the parallel processing.
      The developer has now the possibility to implement own logic for managing the parallel processing work and reporting errors which occurred during the processing.
    • Warnings are connected to particular pages.
      Now the warnings contain the page index information, which is useful in error-handling process.
    • Parallel multi-page image opening.
      Now ABBYY FineReader Engine can distribute opening of multi-page documents to CPU cores. New functionality brings up to 2 times speed-up for image opening step in 2 and 4 core configurations.
    • Exporting to ALTO up to version 3.0.
      Now it is possible to export OCR data into ALTO format according to the following ALTO standard versions: 2.0, 2.1, 3.0.
    • Possibility to save memory in case the information of coordinates on original image are not needed.
      If there is no need in keeping the information of coordinates on original image, the new property KeepOriginalCoordinatesInfo can be used for turning-off the storing this data. It might help to preserve storage and memory space, e.g. during processing of b/w images (~100Kb) the transformation data could be up to 1Mb per image.
    • Crop function for greyscale and black & white images.
      Now the Crop function supports greyscale and black & white images. Previously this function could be used only for color images.
    • New Java wrapper functions for loading Engine.
      Java wrapper contains new Engine loading functions, which now are capable to throw exceptions instead of logging them.
    • New property for fine-tuning of paper size detection.
      The FineReader Engine is able to detect paper boundaries on a scan automatically. It prevents that garbage near borders of a scan is detected as text. In some cases this detection can fail, e.g. if a scanned document contains a big dark picture. It might be taken as a scanning background and removed from a document area as a scanning shadow. To prevent such mistakes a host application may now advise the FineReader Engine to limit the scanning shadow detector hypotheses by providing information on what part a source document is located on a scan.
    • Visual quality improvement of exported PDFs with MRC compression.
      All predefined MRC modes (MinSize, MaxQuality, Balanced, MaxSpeed) have been improved in regard to visual quality.
    • New property for obtaining recognized word region.
      Many customers work with recognized words instead of recognized characters. In most cases, a word of matter usually needs to be highlighted in graphic user interface. The highlighting requires a word-bounding rectangle (region). Since words may include characters of different height and could span several lines the task becomes non-trivial. With help of the new property a word-bounding region can be obtained very easily.

Release 4

  • FineReader Engine 11 Linux - Release 4 (available since 19.06.2015)
    • Back-up possibility for the network license server (network license redundancy)
    • Export to memory
    • Parallel export to PDF and to PPTX
    • New profile for faster barcode recognition (speed)
    • Technical preview of a new OCR language - Farsi.
    • Possibility to enable and disable interpolation in PDF viewers
    • New property for shadows and highlights correction in photographs
    • Possibility to improve recognition quality by removing color objects during pre-processing stage
    • Support of corrupted tiff files opening

Release 3

  • FineReader Engine 11 Linux - Release 3 (available since 27.11.2014)
    • Possibility to extract attachments from PDF and to add them
    • Possibility to see PDF layers in PDF viewers
    • New method for removing color objects in a document
    • New advanced language detection mode
    • Advanced method to check if a page is empty
    • WIBU-Systems dongle support
    • Improved CRM_ContentOnly mode

Release 2

  • FineReader Engine 11 Linux - Release 2 (available since 14.04.2014)
    • Full version of Java wrapper is now available
    • New method that allows to inject the text layer into an existing PDF
      • Parameter to keep the attachments of the original PDF file also in the output file
      • Support of PDF/A 3a and PDF/A 3u
    • Support for “IntelligentMail” / USPS 4-CB barcodes
    • Some API Changes
    • Other enhancements:
      • Detection of vertical text for all languages
      • Enhanced JBIG2 lossless compression
      • Improved memory allocation
      • Export with original layout to XLSX
      • Screenshot detection

Release 1

  • FineReader Engine 11 Release 1 (available since 24.10.2013)
    • Release Date: 17.12.2013 Update with Java Wrapper and 32-Bit support
    • Release Date: 24.10.2013 (EU) 64-Bit only

Code Samples

  • Once you have downloaded and installed FineReader Engine 11 for Linux you can find and start all pre-compiled code samples here: Code Sample Library
  • No tags, yet