Advanced PDF Processing

  • PDFs is the most popular file format and de-facto standard for exchanging documents among users. Based on this ISO standard the new PDF/A format was defined - for long term document archiving (PDF/A-1, PDF/A-2) as well as for data exchange (PDF/A-3)
  • ABBYY products and technologies offer many modern features to open, process and create PDFs.
  • Extracting text from PDFs is not as simple as it seems. There are multiple Reasons why it may fail:
    • Images
    • Text as vectors
    • Font & text encoding issues
    • Internal structure of the PDF files
    • The talk below explains why OCR Technology can be the only way to reliably access textual information in PDFs