Smart Classifier Model Editor

  • ABBYY Smart Classifier is based on a scalable processing back-end that uses machine learning and linguistic technologies to classify unstructured content.
  • The intuitive, web-based Model Editor of Smart Classifier allows the content experts within the organization to set-up, train and maintain the classification models.
  • Smart Classifier can easily be connected and integrated with existing IT systems via a Rest API 1).
  • Important Notes:
    • Setting up a proper classification model with other classification tools often requires scientific know-how and expertise to pick the best classification algorithms. Then they have to be tuned by selecting the best working parameters.
    • ABBYY made this step very intuitive and put a lot of artificial intelligence into Smart Classifier that does this automatically!

Introduction

  • If you have installed ABBYY Smart Classifier, you can start the Model Editor via the Windows Start menu icon

Project Homepage

Smart Classifier Model Editor homepage -- empty no projects yet

Create a new Classification Project

To setup a new classification model, just run through the assistant:

  • Give the model a meaningful name
  • Select the language of the documents that should be classified
  • Select what type of classification or results should be returned:
    • All candidate categories
    • Top candidate category = assign only the category with the highest score
    • Single candidate category = assign a category only if no other possible candidate are found

Define Name & Language

Smart Classifier project setup screen - Model Name& Language
Compatibility Note: the selection option between textual/linguistic or semantic features that was available up to version 2.5 was removed from the interface, because starting from version 2.6 the technology internally selects the best matching option.

Smart Classifier project setup screen - Classification type: Text or Semantic

Definition of the Classification Behavior

Create a new Training set for Classification

Classification of unstructured documents is based on machine learning. Since the content is not structured, a rule based approach will not work. To let the machine “learn your document and classes” a training set of documents is needed.

The Smart Classifier code sample also contains a test-set of documents that make it easy to create a working Classification Model so that you can evaluate the Smart Classifier Model Editor.

  • If you already installed Smart Classifier on your machine, you can find fhe following documents locally under: C:\Users\Public\ABBYY\Compreno Products\2.5\Code Samples\SmartClassifierSampleApplication – or just download the test-collection:
  • The training set requirements are:
    • create a folder structure - where every folder represents a class:
    • Put at least 10 documents that represent a specific class in each folder. Smart Classifier is very flexible it can process Office files, text, HTML, PDFs, images and others…
    • Create a .zip archive out of it

Upload your Training Set

  • Select and upload the .zip

  • Learning of the document specific features will start automatically

Result: You created a first Classification Model based on your classed and your documents.

Evaluate your Classification Model

Smart Classifier Integration

Production API

  • Once an initial classification model is available, you can start the Smart Classifier integration via the REST Api. The default web port is: 83 2)
  • Details can be found in the Integration Guide
  • A Code Sample how to integrate Smart Classifier is available in the installation folder

Model Training & Management API

Version 2.6 of ABBYY Smart Classifier introduced an API that allows to setup and to manage classification models. With these new feature-set developers, can automate the creation training and tuning of the models via API.

The API provides access to the same features/options that are available via the Classification Model Editor (screenshots above), this includes:

  • Create new models
  • Define/Adjust the model settings
  • Upload and delete training/control documents
  • Start re-training
  • Get the classification statistics
  • Control the “stop-word-list”
  • Deploy a model
1) Classification model creation & production
2) v2.6, v2.5 used Port: 81