What is PDF/A

  • PDF/A is a file format and an ISO Standard for the long-term archiving of electronic documents.
  • PDF/A is in fact a subset of PDF, obtained by leaving out PDF features not suited to long-term archiving.
  • There are different levels of PDF/A
    • PDF/A-1b - Level B compliance in Part 1
      PDF/A-1b has the objective of ensuring reliable reproduction of the visual appearance of the document.
    • PDF/A-1a - Level A compliance in Part 1
      PDF/A-1a includes all the requirements of PDF/A-1b and additionally requires that document structure be included (also known as being “tagged”/“Tagged PDF”), with the objective of ensuring that document content can be searched and repurposed. PDF/A-1a also requires Unicode character maps.
    • PDF/A-2 is based on ISO 32000-1
      A-2 a new standard
      PDF 1.7 and is defined by ISO 19005-2:2011, published on June 20, 2011 under the formal name Document management – Electronic document file format for long-term preservation – Part 2: Use of ISO 32000-1 (PDF/A-2).
    • PDF/A-3
      • The standard was published in October 2012 and differs form PDF/A-2 in a way that it allows to embed all kinds of file formats. For example: XML, Office formats, raw binary data, etc
      • Important: the long-term compatibility will only be guaranteed for the PDF-part of the collection. If an organization will embed other file formats, then there are reasons/benefits to have access to the other file formats and accepting the risk that they are not usable in 100 years.

PDF/A Minimum Requirements

  • Things that have to be full filled to be PDF/A compliant:
    • Audio and video content are forbidden.
    • JavaScript and executable file launches are forbidden.
    • All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering. This also applies to the so-called PostScript standard fonts such as Times or Helvetica.
    • Colorspaces specified in a device-independent manner.
    • Encryption is forbidden.
    • Use of standards-based metadata is mandated.
    • External content references are forbidden.
    • LZW and JPEG2000 image compressions are forbidden in PDF/A-1,
      but JPEG 2000 compression is allowed in PDF/A-2.
    • Transparent objects and layers (Optional Content Groups) are forbidden in PDF/A-1, but they are supported in PDF/A-2.
    • Provisions for digital signatures in accordance with the PAdES (PDF Advanced Electronic Signatures) standard are supported in PDF/A-2.
    • Embedded files are forbidden in PDF/A-1, but PDF/A-2 offers the possibility to embed PDF/A files, allowing archiving of sets of documents in a single file.

PDF/A Support in ABBYY Technology Products

PDF/A Export (PDF/A-1b & PDF/A-1a) is available in the following ABBYY technology products

FineReader Engines - OCR & Document Conversion

FlexiCapture Engine - Separation, Classification & Data Capture

Recognition Server - Solution for server based processing and document capture

FlexiCapture - Solutions for Data Capture

PDF/A-2 Support

In addition to the common PDF and PDF/A-1 formats, FineReader Engine 11 now experts to PDF/a-2. The new options of the ISO standard format are:

  • Support of JPEG2000 compression to generate smaller files
  • A-2a – tagged & unicode PDF/A-2
  • A-2u – not-tagged PDF/A-2 with an ability to extract text in Unicode.

PDF/A-2 enables creation of smaller PDF files using JPEG2000 compression. For long-term archiving, this can help reduce used storage space and enable faster access when working on low bandwidth networks.

The general technical changes of PDF/A-2 are:

  • based on based PDF 1.7 (ISO 32000-1)
  • highly efficient JPEG2000 compression allowed
  • support for transparency effects and layers
  • embedding of OpenType fonts
  • provisions for digital signatures in accordance with the
    PAdES (PDF Advanced Electronic Signatures) standard.
  • possibility to embed PDF/A files in PDF/A-2,
    allowing archiving of sets of documents as individual documents in a single file.

PDF/A-3 Support

PDF/A-3 is an extension of the A-2 standard which allows inclusion of PDF/A files or files in a variety of other binary formats such as XML or Office formats. Long-term archiving and readability of the PDF/A part is still guaranteed, and the binary attachments can deliver additional benefits.

The PDF/A-3 extended container capabilities will make this format attractive in new areas, for example when a graphical representation of a document should be combined with some source data. The new e-invoice format defined by the Forum for Electronic Invoices Germany (FeRD) is based on PDF/A-3 and XML.

Sine Release 3 of FineReader Engine 11 the API is extended, so that included files/attachments can be extracted and also be added to a PDF.

