What is OCR?

Language:
EN
Product-Line:
FineReader Engine, Mobile OCR Engine, Cloud OCR SDK
Version:
7.x, 8.x, 9.x, 10, 11
Type:
Knowledge Base & Support
Category:
General Features
KB-Type:
Tips & How to Information

General

  • OCR is the abbreviation for “Optical Character Recogition”
    In simple words: Text (as pixels) within images is found and “read” by a computer. The result is “real”, editable text. The image can be converted a large variety of formats such as DOCX, ODT, RTF, TXT, HTML, XML and PDF(/A)s. So humans and IT systems can process and work with the information hidden in the documents.
  • The alternative to OCR would be manual keying / re-typing of the wanted information.

But OCR does not mean “Only Character Recogition” - because modern OCR technology and products do much more:

OCR steps - more than character recgognition

Steps from Image to Text

  • Opening images and PDFs
    • Enabling scanning, e.g. via TWAIN
    • Opening a large variety of different image formats and PDFs
  • Preparing the images
    • Split multi-page files into single pages to increase speed and scalability on multiple cores machines
    • Rotate images so that the technology can read the text
    • Clean the images; e.g. remove scanning dust or ISO-noise from digital cameras
  • Analyze the layout, detected text, images, barcodes tables areas.
    • Detect the reading order of the texts
    • Analyze the text blocks and detect the lines and find/identify the individual characters
  • Read the “individual” characters = Apply optical character recognition
    • Vote different hypothesis of single characters, e.g. is it
    • “0” or “O” or “o” or a “Q” or “Ö”
    • “I” or “1” or “!” or “|”
    • more on: OCR Voting API
  • Rebuild the text on a word level by using language information
    • What characters are used and allowed in the language
    • What recognition settings are set-up internally
    • Are defined word lists available or can some details be looked up in a database
    • Use linguistic and morphology dictionaries
      more on: Dictionaries and OCR
  • Export the recognized:
    • text in the proper “Unicode”
    • Provide all the details “found” in XML, e.g.
      • Character Positions (original and after de-skewing of the page)
      • Fonts, Formats (normal/bold), color
      • Hypothesis
      • Word in dictionaries
  • Reconstruct and synthesize the original layout for different output formats:
  • Export/Save the different formats to RAM or disk so that they comply to the format standards.

Resume

  • OCR is a set of very different, computing intense processes where a lot of mathematics, statistics and linguistic is involved.
  • OCR is a fuzzy process, and development and improvement needs a lot of know-how and testing.
  • Each processing step for text recognition is already complex, but at the end “The whole is greater than the sum of its parts”

Related Articles

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.