Document Classification in FineReader Engine 12 - Code Sample (Linux)
- Language:
- EN
- Product-Line:
- FineReader Engine
- Version:
- 12
- Platform:
- Linux
- Type:
- Knowledge Base & Support
- KB-Type:
- Code Samples Collection
- Category:
- Document Classification
- Coding:
- C++
- Image:
The sample demonstrates how ABBYY FineReader Engine can be used for document classification. It provides a ready-to-use algorithm for training your own classification models and classifying documents with their help. The sample works with the data located in two folders: a folder for training and a folder for classification.
Description
The sample uses the following procedure:
- Create the
Engine
object using theInitializeEngine
function. - Create the
ClassificationEngine
object which provides access to Classification API of ABBYY FineReader Engine. - Create a training data set:
- Create the
TrainingData
object using theCreateTrainingData
method of theClassificationEngine
object. - Copy the names of the folders with image files for training to a new
StringsCollection
object. These names will be used as category names for classification. - Create new
Categories
object. Use theAddNew
method of theCategories
object to createCategory
objects for each folder name respectively. - Match to each
Category
object a newClassificationObjects
object and add documents to it, doing the following for each document:- Call the
CreateFRDocument
method of theEngine
object to create theFRDocument
object. - Add pages from the image file to the document. Use the
AddImageFile
method of theFRDocument
object.
Note: This sample works with image classifier. In this case Analyze and Recognise methods of FRDocument object are not used. - Use the
CreateObjectFromDocument
method of theClassificationEngine
object to create an object for classification from the document object. - Add the new classification object to the
ClassificationObjects
object using theAdd
method.
- Save the training data set with the
SaveToFile
method of theTrainingData
object.
- Train the classification model:
- Load the training data set to the
TrainingData
object using theLoadFromFile
method. - Create the new
Trainer
object. - Tune the settings of the trainer. Create the
TrainingParams
object, set theClassifierType
property toCT_Image
andTrainingMode
property toTM_Balanced
. - Train the model using the
TrainModel
method of theTrainer
object and save the results to a newTrainingResult
object. - Save the trained model to a new
Model
object using theSaveToFile
method.
- Classify the documents:
- Load the names of the files for classification to a new
StringsCollection
object. - Load the trained model with the
CreateModelFromFile
method of theClassificationEngine
object. - Access the
Languages
property of theModel
object. In case of image type of classifier this property is empty. - To classify the files do the following for each document:
- Load the file from the folder to a new
FRDocument
object. - Add pages from the image file to the document. Use the
AddImageFile
method of theFRDocument
object.
Note: This sample works with image classifier. In this caseAnalyze
andRecognise
methods ofFRDocument
object are not used. - Use the CreateObjectFromDocument method of the ClassificationEngine object to create an object for classification from the document object.
- To get the classification results create the
ClassificationResults
object. Pass it to theClassify
method of theModel
object. Save the result to a newClassificationResult
object from theElement
property of theClassificationResults
object. - Access the result category with the
CategoryLabel
property and the probability with theProbability
property of theClassificationResult
object. - Display the results for the current document.
- Unload FineReader Engine — use the
DeinitializeEngine
function.
For more details, see the Document Classification section of the Developer's Help.
Back To:
- No tags, yet