InfoExtractor SDK

  • ABBYY InfoExtractor is a system for Natural Language Processing (NLP)
  • InfoExtractor can extract relevant data from unstructured, textual information.
    The system identifies not just entities & facts but also the relationships between them.
  • InfoExtractor is built on ABBYY’s NLP technology Compreno.
  • InfoExtractor is based on a scalable back-end system with API to power business analytics and to optimize content-intensive decision processes.

From the Lab to Projects

The Situation/Challenge:

  • Different languages use different words and grammar rules to code information.
  • Information in documents, such as emails, letters, articles, contracts are written in natural language.
  • Words can mean different things within sentences
  • Computers are good in “managing characters” and counting “words”, but there is no real intelligence in IT systems.

The ABBYY Approach:

  • ABBYY's Research & Development team is working for over 15 years on a technology stack that enables computers to
    • extract plain text (in reading order) from all kind of documents
    • deconstruct natural language texts into words and sentences
    • understand how sentences are constructed and how the words are grammatically used (syntax analysis)
    • what certain words mean (semantic analysis)
  • This new technology is named ABBYY Compreno.
  • ABBYY InfoExtractor is one of the first products that are based on Compreno.

The Status:

  • The InfoExtractor back-end is built as a distributed IT system that allows to use many computers for the CPU intense analysis.
  • There is an API that allows to integrate the system.
  • The core technology is (of course) very language dependent. Currently, Compreno supports English, Russian, and it is “learning” German.

InfoExtractor Details

More Information

  • Further technical articles on ABBYY InfoExtractor will be published this portal - so just bookmark this page ;-)