Table of Contents
Document classification, sorting or document categorisation is a challenge that many companies or organisation face. The right categorisation of documents can be the first step to document archiving as well as a 'trigger' for automated workflows and document routing.
A document classified as 'invoice', for example, can be automatically routed to the finance department, while a document classified as 'purchase order' will be automatically delivered to the sales department.
ABBYY's developers have a deep knowledge about document analysis and processing. This is why our SDKs offer different ways and possibilities to implement document classification.
- ABBYY SDKs offer different types of classification technologies
- Image based classification
- Rules-based classification
- Content-based classification (statistical)
- Semantic classification
How to start
- To start a classification project you should start to collect a set of documents that are typical representatives of the specific document class.
- The qualified document collection for every class is split into two sets:
- one for training the classification engine/backend
- one for quality control, testing and tuning the classification results
- Once the setup is done, the production can start. The documents that should be classified will be analyzed and matched with the trained document classes.
- The classification results are then provided as a set of hypothesis. This information then can be used to
- detect the document type so that they can be processed accordingly, for example, detect if the scanned document is a business card, a receipt or a medical prescription
- route documents into different business processes so that they are directly sent to the right experts
- tag documents, e.g. in a DMS system or SharePoint library