Table of Contents
FineReader Engine 11 - What is new Overview
ABBYY FineReader Engine 11 offers a variety of new built-in features and improvements making it the ideal text recognition and document conversion SDK for your systems and applications. Highlights of the new key features include:
- Classification - Image and text-based document type detection
- Business Card Recognition - Single and multi-card processing, vCard export
- Extended PDF Capabilities - PDF/A2 & PDF/A-3 support, enhanced PDF processing
- New and Improved OCR Technology - New OCR and ICR languages, new barcode types, improved image pre-processing Development Improvements 64-bit support, asynchronous scanning, new Java Native Interface (JNI) support
Information as PDF
- The classification API is used to create a classification database, which describes several types of documents to be classified, and to classify documents on the basis of this defined set.
- This content based classification is intended for simple scenarios, where a document should be example classified, for example as: contracts, invoices or receipts.
The two main steps in detail:
- Creating a classification database
You select several images of documents of each type. Representatives of each type have similar appearance (similar layout of elements). You can use these images to create the classification database. Scanned images or photos may need some pre-processing before database creation.
- Classifying documents
You can use the created database to classify documents in the document flow. You scan documents, or load photographed documents and pass them to the pre-trained classification system which use the classification database to determine the type of each document. You may update the classification database each time you add new types of documents or change existing ones.
- The SDK also contains a code sample how to train and work with the new classification API
Business Card Recognition
- New Business Card recognition API
- FineReader Engine 11 includes a special API section for data extraction from business cards.
- Extended automatic card splitting, when the scan/image contains several business cards
- You can use the predefined processing profile
- See details about the new Business Card object in the documentation.
- The SDK also contains a code sample how to integrate business card reading
- Further Info:
Extended PDF Capabilities
Multiple new API options give developers more control over PDF processing so that they can fine-tune their own applications and services.
- Opening (small) PDF files from memory
- Up to 12% faster export speed, compared to previous technology
- Ability to specify resolution for rasterization during PDF opening
- Keep Bookmarks
- Higher quality of highly compressed MRC PDFs
Higher background image compression in V11 can reduce the size of output PDF MRC files up to 50% (compared to version 10 implementation, based on ABBYY internal tests).
- New in V11 Release 3: Parallel export to PDF and to PPTX
Now export of multi-page documents to PDF and to PPTX will be performed in parallel mode. This feature increases processing speed of multi-page documents in parallel processing scenario using Batch Processor or FRDocument. In the table below you can see the number of pages processed per minute and exported to PDF in this release compared to the previous release:
|Release 3 with parallel export||Release 2 without parallel export|
|Processing with FRDocument||111||79|
| Processing with FRDocument |
(with PageFlushingPolicy = PFP_KeepInMemory)
|Processing using Batch Processor||117||82|
In addition to the common PDF and PDF/A-1 formats, FineReader Engine 11 now experts to PDF/a-2. The new options of the ISO standard format are:
- Support of JPEG2000 compression to generate smaller files
- A-2a – tagged & unicode PDF/A-2
- A-2u – not-tagged PDF/A-2 with an ability to extract text in Unicode.
PDF/A-2 enables creation of smaller PDF files using JPEG2000 compression. For long-term archiving, this can help reduce used storage space and enable faster access when working on low bandwidth networks.
PDF/A-3 is an extension of the A-2 standard which allows inclusion of PDF/A files or files in a variety of other binary formats such as XML or Office formats. Long-term archiving and readability of the PDF/A part is still guaranteed, and the binary attachments can deliver additional benefits.
The PDF/A-3 extended container capabilities will make this format attractive in new areas, for example when a graphical representation of a document should be combined with some source data. The new e-invoice format defined by the Forum for Electronic Invoices Germany (FeRD) is based on PDF/A-3 and XML.
Read more about ZUGFeRD processing with ABBYY SDKs.
Since Release 3 the API is extended, so that included files/attachments can be extracted and also be added to a PDF.
- Native 64-bit Support
- FineReader Engine 11 provides C++ DLLs that can be linked in x64 applications directly without using a COM proxy. The neutral .NET interops allow .Net projects for 32-bit or 64-bit machines without re-compilation. The new 64-bit support makes it easier to integrate and to roll out ABBYY OCR technology in applications that need more than 4 GB of RAM.
- Simplified Java Integration
FineReader Engine 11 can be used from Java on 64-bit systems either by loading into the current process (InprocLoader), or by loading into a separate process (OutprocLoader). The new ready-to-use Java classes for the Engine library cover the full API*.
- Extended Scanning Capabilities
- Asynchronous Scanning enables recognition of scanned pages before scanning of all pages is finished.
- Extended access to scan settings, including access to scan source capabilities.
- Ability to specify compression type of scanned images.
- The new code sample makes it easy to implement better and faster scanning for your application.
- New and updated Code Samples
- New: Classification, Business Card Reading, Scanning, Thread-Pool
- Updated: Image preprocessing, Camera OCR
New and Improved OCR Technology
- New ABBYY Arabic OCR technology
- Arabic as a new OCR language will be supported from Version 11 on and can be combined with other available OCR languages.
- Arabic now also comes with dictionary support.
- Compared to the technical preview in Version 10 the number of incorrectly recognized words for Arabic OCR has been halved, whilst at the same time recognition speed is up to 3 times faster (based on ABBYY test and test set).
- More on: Arabic OCR
- New and improved language support
- New OCR languages: Turkmen (Latin) and Old Slavonic
- New ICR languages: Danish, Norwegian (Bokmal & Nynorsk), Old English, Serbian (Cyrillic), Tajik
- Latin language has full dictionary support
- Improved OCR for Chinese, Japanese & Korean
Processing speed in fast mode has been increased, while maintaining accuracy level.
- Japanese up to 2.5 times faster
- Chinese (Simplified) up to 2.5 times faster
- Chinese (Traditional up to 4.0 times faster
- Korean up to 2.5 times faster.
- User dictionaries can be created for Japanese and Korean languages
- All elements of UI and messages of FineReader Engine 11 are now available in Japanese.
- More on: Chinese, Japanese & Korean (CJK) OCR
- Improved Image Pre-processing
Input image quality is a key factor in achieving good OCR results. At the end recognition works faster delivers higher accuracy. Better image quality also enables higher compression rates for MRC PDFs.
- Extended geometrical distortions correction
- Auto-splitting of double-pages
- Background lightening
- Better ISO noise removal
- New pre-processing for documents with stamps and written notes*, the image is split into two layers: color and black-and-white.
- New Barcode Type: MaxiCode - created and used by United Parcel Service
- Synthesis & Export
- Extended ABBYY XML export with the ability to save information of paragraph styles and roles in XML file.
- Improved font management API and extended access to the fonts used during document synthesis (predefined font filters)
- Export of business cards to vCard format
- Recreation of the logical structure of a document is an option during export to RTF, DOCX, and HTML formats
- New color settings for embedded pictures in RTF, DOCX, PPTX, HTML, EPUB, and FB2 formats
- Export to XPS (XML Paper Specification)*
- Other improvements
- FineReader Engine collections can be iterated using the for each statement in .NET
- Ability to cancel processing operation and repeat processing of a page with Batch Processor
* Indicates functionality not available immediately, but planned for release in a maintenance release of FineReader Engine 11.