Table of Contents
Receipt Capture SDK - Usage Recommendations
This articles provides guidelines and recommendations on challenges and limitations that you should be aware of when you start capturing receipts to extract data.
Bad image quality
In case of bad picture, such as no focus, too dark, very small resolution, etc., OCR will not be able to recognize text well.
Here you can find our guidelines for optimal image resolution, colors, and font and image size for gaining the best result.
Only part of image is captured
It is quite typical to capture image of only most important part of receipt and leaving header or footer outside. Our technology will continue trying hard to extract anything it can, but result will be not that good for multiple reasons:
- Some fields will be just missing (such as Total, Address, Phone, etc.) We can't extract what is not on the image
- Logo detection may not work if there is no logo or logo is only partially captured. Without logo detection accuracy of Vendor classification is much lower
- Predefined templates most likely will not work on partial images, thus overall quality of extraction will suffer
Very long receipts
There is still no practical solution for very long receipts that do no fit completely into one camera shot with acceptable resolution. Though such receipts are not rare, still this is not a majority, according to our current experience.
Currently the technology only support receipts printed by special receipt printers on paper tape. However, there are others such as:
- A4 page receipts (from hotels, from Uber, etc.)
- Small paper cards (from vending machines for public transportation, public parking, etc.)
OCR Technology Templates do not work with receipts which are twisted. Our line straightening technology is tuned for natural 3D deformations, but if receipt is intentionally twisted, it will fail. Reclamation 817855.
Torned receipts or multiple receipts on one image
OCR Technology Templates do not work with receipts which are torn into 2 or 3 parts and which are stacked beside each other. Our cropping technology is assuming there is only one receipt and will fail. Our recognition technology also assumes there in only one receipt and result will not be good. Reclamation 817855.
Sample of 1 torned receipt on 1 photo:
Several separated photos of 1 receipt on 1 image
OCR Technology Templates do not work with the following receipts: customer makes 2-3 photos of 1 receipt (top, bottom, middle part) and then paste them together on 1 image. Reclamation 817855. Sample of such receipts:
Limited support for negative amounts
The technology is capable of extracting negative amounts correctly only in case of using the pre-trained layout descriptions. In case the technology doesn’t have a description of a receipt layout it returns only positive amounts in order to avoid many mistakes on dirty images due to OCR misrecognitions.
Limited support for multi-line line items
If the technology has a pre-trained description of receipt layout it is accurate on line item extraction even if a line item comprises several text lines. In absence of a pre-trained receipt layout description the technology makes mistakes more often because it was tuned in preference of extracting all items instead of gluing them into less count of items.
- No tags, yet