OCR Recognition Features in a Nutshell

  • Recognition of the characters printed on an image is probably the most obvious task that OCR. ;-)
  • A lot of different aspects have to be considered
    • Languages are in documents
    • How is the text printed on paper: on a laser printer, on a typewriter, on a dot matrix printer, etc..
    • What areas should be analyzed or recognized
      • Only text without reading order
      • Should tables be detected?
      • Should barcodes be recognized
    • Does the document consist of separate pages or are they linked?
    • Should the logic of the document be analyzed to reconstruct the table of content?
    • How many CPU cores can be used for recognition.
  • Below you can see that there are a lot of different areas that have a strong influence on this processing step.

Different Recognition Scenarios

