Creation Document Definitions by Training

Language:
EN
Product-Line:
FlexiCapture Engine
Platform:
Windows
Type:
Knowledge Base & Support
KB-Type:
Code Samples Collection
KB-Topic:
General
Category:
Recognition
Image:
image: icon_autolearningtechnologydemo.gif

This sample allows you to create simple Document Definitions for one page documents - without using FlexiLayout Studio. The Document Definitions are created using a training set of images, once finished the definitions be used for data extraction with ABBYY FlexiCapture or ABBYY FlexiCapture Engine.

Note: this technology is also available in FlexiLayout Studio, but the new API allows that users of the application can also train new documents. The generated templates can then be used within your application.

Description

While creating Document Definitions, the sample works with a set of training images. You mark the data fields, which values you want to extract, and several reference elements, which allow FlexiCapture Engine to locate the position of data fields. Thus is done on several pages. On the base of this information FlexiCapture Engine generates Document Definitions, which then can be used for data extraction.

If the set of training images contains several types of documents, for which several Document Definitions should be created, FlexiCapture Engine uses a special classifier, which helps to select the correct Document Definition for a page. This classifier is trained together with Document Definitions and can be saved and then used to speed up selection of Document Definitions during processing.

Note that this sample is intended for creating simple Document Definitions. If you need a comprehensive Document Definitions with a large number of fields and reference elements, we recommend that you use the tools of ABBYY FlexiCapture.

The sample uses the procedure of processing similar to the one described in the Creating a Document Definition by training on a set of images code snippet.

Creating Document Definitions Training

You have a group of images, from which you need to extract some data fields. These can be a set of similar images, which contain the same fields, or a set of images of several document types. Select several examples of such images (from 3 to 5 images for each document type) and place them to one folder. Give preference to the ones which have the best image quality. These will be the template images that will be used for Document Definitions creation.

To create Document Definitions, do the following:

1) Specify the path to the template images in the Path to training images field.

2) Select Begin a new training project and click Next.

3) The first image from the specified folder will be displayed. Specify the name of the created Document Definition and select the language of your documents. By default, English language is selected. Click Next.

4) Select data fields on the image. These are those fields, which contain data you want to be extracted. Perform the following:

  • ABBYY FlexiCapture Engine performs pre-recognition of the image before you select data fields on it. You can view recognized characters highlighted with blue color on the image. If you want to zoom the image, in the main menu select View > Zoom and the desired scale from the list.
  • Use the mouse to draw one or several boxes surrounding data fields. The field name and the recognized text of the field will be displayed below the image.
  • You can modify the created fields:
    • Change the name of a field (click Rename and specify the new name)
    • Configure the data type of a field (click Configure and select the type from the drop-down list). The fields can be of the following data types: text, date and time, currency, number, regular expression.
    • Delete unnecessary fields (click Delete for the field you want to delete)
  • Click Next.

5) You proceed with selecting reference elements on the image. Reference elements help FlexiCapture Engine to find data fields on the image. Reference elements are some fields, which can be found on most of the images of this type. For example, field captions, header. Do the following:

  • The data fields you have already selected are displayed on the image surrounded by green boxes. Use the mouse to draw one or several boxes for reference elements. They appear in red boxes.
  • You can change the name of the reference element (click Rename and specify the new name) or delete unnecessary elements (click Delete for the tag you want to delete).
  • Click Next.

6) After that FlexiCapture Engine tries to apply the created Document Definition to the next image in the training folder and displays the result. Check the layout on the image and change it, if necessary. Click Next.

7) If the next image has another type and the Document Definition cannot be applied correctly, the sample will suggest creating another Document Definition for this type of document.

8) In this case, select Add new from the drop-down list and describe another Document Definition as it was done in the steps 3-5 of this guide.

9) Thus all images in the training folder are processed: you are checking that a Document Definition is selected correctly, correcting layout or adding a new Definition if necessary. You can skip images that you do not want to use for training by selecting in the main menu Document Type > Skip Image.

10) When all images from the training folder are used for training, the sample will offer you to start testing the created Document Definitions or save them. We recommend that you test the Document Definitions on more images prior to saving. To start testing, select the Start testing with the built-in tool using the images from the folder below and specify the folder with testing images in the field below. Click Next.

11) For each testing image check that the data fields have been found correctly. If so, click Succeeded. If some data fields are not found or recognized incorrectly, make sure that the image has acceptable quality. If the quality of the image is poor, click Failed. If the quality of the image is fine, you can add the image for additional training by clicking Add current image to the training batch (the button is only available, if this testing image has not been used for a Document Definition creation) and proceed with training after you have checked all testing images.

12) After the test is finished, the sample displays test statistics and suggests exporting results or proceed with training if necessary. If you are satisfied with the results of testing, save the resulting Document definitions: select Export the definitions for the trained document types and start using them in your project and specify the folder where the files should be saved in the field below.

13) The sample saves a FlexiCapture Document Definition files (*.fcdot) and a classification tree file (*.cfl) if more than one Document Definition has been created. You can also save the Definition in a FlexiLayout file (*.afl). To do it, in the main menu select File > Export FlexiLayout and specify the path to the file. After the Document Definition has been saved, you can use it for data extraction with ABBYY FlexiCapture Engine or ABBYY FlexiCapture.

14) If you want to add some modifications to the created Document Definitions, run the sample again, select the folder that has already been used for the Document Definitions creation and select Continue training the existing project. Click Next and proceed with training. You can modify Document Definitions settings (name, language), add more template images, add new Document Definitions, etc. To change settings, in the main menu select Document Type and a suitable item.

Video on Receipts Training

The video introduces the auto-learning-training for receipts.

Receipt processing and data extraction is an area that ABBYY currently focuses with dedicated Business Development activities. .

Back To:

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.