- ABBYY InfoExtractor is a system for Natural Language Processing (NLP)
- InfoExtractor can extract relevant data from unstructured, textual information.
The system identifies not just entities & facts but also the relationships between them.
- InfoExtractor is built on ABBYY’s NLP technology Compreno.
- InfoExtractor is based on a scalable back-end system with API to power business analytics and to optimize content-intensive decision processes.
From the Lab to Projects
- Different languages use different words and grammar rules to code information.
- Information in documents, such as emails, letters, articles, contracts are written in natural language.
- Words can mean different things within sentences
- Computers are good in “managing characters” and counting “words”, but there is no real intelligence in IT systems.
The ABBYY Approach:
- ABBYY's Research & Development team is working for over 15 years on a technology stack that enables computers to
- extract plain text (in reading order) from all kind of documents
- deconstruct natural language texts into words and sentences
- understand how sentences are constructed and how the words are grammatically used (syntax analysis)
- what certain words mean (semantic analysis)
- This new technology is named ABBYY Compreno.
- ABBYY InfoExtractor is one of the first products that are based on Compreno.
- The InfoExtractor back-end is built as a distributed IT system that allows to use many computers for the CPU intense analysis.
- There is an API that allows to integrate the system.
- The core technology is (of course) very language dependent. Currently, Compreno supports English, Russian, and it is “learning” German.