Document Classification

Language:
EN
Product-Line:
FlexiCapture Engine, FineReader Engine, Smart Classifier
Version:
2.5, 10, 11
Type:
Scenarios/Tasks
Category:
Document Classification

Introduction

Document classification, sorting or document categorisation is a challenge that many companies or organisation face. The right categorisation of documents can be the first step to document archiving as well as a 'trigger' for automated workflows and document routing.

A document classified as 'invoice', for example, can be automatically routed to the finance department, while a document classified as 'purchase order' will be automatically delivered to the sales department.

ABBYY's developers have a deep knowledge about document analysis and processing. This is why our SDKs offer different ways and possibilities to implement document classification.

Classification Features

  • ABBYY SDKs offer different types of classification technologies
    • Image based classification
    • Rules-based classification
    • Content-based classification (statistical)
    • Semantic classification

How to start

  • To start a classification project you should start to collect a set of documents that are typical representatives of the specific document class.
  • The qualified document collection for every class is split into two sets:
    • one for training the classification engine/backend
    • one for quality control, testing and tuning the classification results
  • Once the setup is done, the production can start. The documents that should be classified will be analyzed and matched with the trained document classes.
  • The classification results are then provided as a set of hypothesis. This information then can be used to
    • detect the document type so that they can be processed accordingly, for example, detect if the scanned document is a business card, a receipt or a medical prescription
    • route documents into different business processes so that they are directly sent to the right experts
    • tag documents, e.g. in a DMS system or SharePoint library

Classification process scheme