Document Classification in FineReader Engine 12 - Code Sample (Linux)

Language:
EN
Product-Line:
FineReader Engine
Version:
12
Platform:
Linux
Type:
Knowledge Base & Support
KB-Type:
Code Samples Collection
Category:
Document Classification
Coding:
C++
Image:
image: icon_classification.gif

The sample demonstrates how ABBYY FineReader Engine can be used for document classification. It provides a ready-to-use algorithm for training your own classification models and classifying documents with their help. The sample works with the data located in two folders: a folder for training and a folder for classification.

Description

The sample uses the following procedure:

  1. Create the Engine object using the InitializeEngine function.
  2. Create the ClassificationEngine object which provides access to Classification API of ABBYY FineReader Engine.
  3. Create a training data set:
    1. Create the TrainingData object using the CreateTrainingData method of the ClassificationEngine object.
    2. Copy the names of the folders with image files for training to a new StringsCollection object. These names will be used as category names for classification.
    3. Create new Categories object. Use the AddNew method of the Categories object to create Category objects for each folder name respectively.
    4. Match to each Category object a new ClassificationObjects object and add documents to it, doing the following for each document:
      1. Call the CreateFRDocument method of the Engine object to create the FRDocument object.
      2. Add pages from the image file to the document. Use the AddImageFile method of the FRDocument object.
        Note: This sample works with image classifier. In this case Analyze and Recognise methods of FRDocument object are not used.
      3. Use the CreateObjectFromDocument method of the ClassificationEngine object to create an object for classification from the document object.
      4. Add the new classification object to the ClassificationObjects object using the Add method.
    5. Save the training data set with the SaveToFile method of the TrainingData object.
  4. Train the classification model:
    1. Load the training data set to the TrainingData object using the LoadFromFile method.
    2. Create the new Trainer object.
    3. Tune the settings of the trainer. Create the TrainingParams object, set the ClassifierType property to CT_Image and TrainingMode property to TM_Balanced.
    4. Train the model using the TrainModel method of the Trainer object and save the results to a new TrainingResult object.
    5. Save the trained model to a new Model object using the SaveToFile method.
  5. Classify the documents:
    1. Load the names of the files for classification to a new StringsCollection object.
    2. Load the trained model with the CreateModelFromFile method of the ClassificationEngine object.
    3. Access the Languages property of the Model object. In case of image type of classifier this property is empty.
    4. To classify the files do the following for each document:
      1. Load the file from the folder to a new FRDocument object.
      2. Add pages from the image file to the document. Use the AddImageFile method of the FRDocument object.
        Note: This sample works with image classifier. In this case Analyze and Recognise methods of FRDocument object are not used.
      3. Use the CreateObjectFromDocument method of the ClassificationEngine object to create an object for classification from the document object.
      4. To get the classification results create the ClassificationResults object. Pass it to the Classify method of the Model object. Save the result to a new ClassificationResult object from the Element property of the ClassificationResults object.
      5. Access the result category with the CategoryLabel property and the probability with the Probability property of the ClassificationResult object.
      6. Display the results for the current document.
  6. Unload FineReader Engine — use the DeinitializeEngine function.

For more details, see the Document Classification section of the Developer's Help.


Back To:

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.
  • No tags, yet