Classification Differences in FineReader Engine & FlexiCapture Engine

Language:
EN
Product-Line:
FlexiCapture Engine, FineReader Engine
Version:
9.x, 10, 11
Type:
Technology & Features, Comparisons
Category:
Document Classification

Introduction

This is a feature and usage scenario comparison between different classification technologies available in FineReader Engine and FlexiCapture Engine.

  • FineReader Engine is an OCR toolkit, designed for converting scanned documents and PDFs into different formats such as plain text, Office formats, HTML, searchable PDFs or XML.
  • FlexiCapture Engine is the SDK for forms processing, document separation, classification and data extraction.

Below is a short overview about the different classification approaches in the two different toolkits:

Detailed Feature Comparison

General

FineReader Engine FlexiCapture Engine
Intended usage Straight forward classification to be integrated and used together with a full text OCR Engine – simply to know which document type is currently being processed, e.g. a receipt, a business card or an invoice,… FlexiCapture Engine allows document detection and separation from a stream of images, document classification and advanced data extraction. Classification is traditionally a very important part of data extraction.
Classification Technologies used Image & Content Classification Image, Layout & Content Classification

Classification Technologies

FineReader Engine FlexiCapture Engine
Image Classifier Yes, Image pattern (black pixels location) and
OCRed text analysis of large text elements such as titles
Content based Classifier Yes, statistical classification on the textual information generated by the OCR process Yes, FlexiLayouts allow defining a custom rule set using keywords, database lookups and regular expressions
Custom Rule Support Custom features from the document can be integrated, e.g. based on layout analysis or text Custom decision trees are possible
Templates are based on own extraction rules and contain scripts to control the classification and data extraction process
Template based Classification No Yes, FlexiLayouts
Classification Range Single (images) page approach Single (images) pages with image classification
and
Document approach, single and multiple images/PDFs via FlexiLayouts

Classification Training & Setup

FineReader Engine FlexiCapture Engine
Training Document classes are created via training of predefined document collections
Training GUI No, but a code sample is available Yes, with FlexiLayout Studio as the development tool
Training API Yes, a code sample is availableNo, for FlexiLayout creation
Template Creation No - No, for image based page (pre-) classification
- Yes, for document separation & classification
Training Set 3 or more images of each different document type to automatically train a new class 3 or more images of each different document type to
- automatically train a new class
- manual setup of the classification rules or a classification tree in FlexiLayout Studio (support of single or multi-pages documents)
It is recommended to use different layout representations of the same document class

Other

FineReader Engine FlexiCapture Engine
Document Separation No Yes, based on document definitions, FlexiLayouts or fixed form templates
Annex pages support No Yes, annex pages are not used for classification but are still part of the document
Classification Availability Add on module in FineReader Engine 11 Default feature in FlexiCapture Engine
Additional applications shipped with the SDK: FlexiCapture Standalone and FlexiLayout Studio to develop the document definition, classification and data extraction logic
  • The FineReader Engine API can also be licensed in FlexiCapture Engine. So technically it is possible to use the different classification and text recognition approaches in one FlexiCapture Engine based application.
This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.