ABBYY Cloud OCR Service Architecture

  • The ABBYY Cloud OCR SDK was developed from scratch and fully designed for the Microsoft Windows Azure platform. It is not based on an already existing peace of “on premise” software - just scaled up.
  • So the OCR service is a very modular system using different components of the Azure platform and the Microsoft data centers.

5 Components of Microsoft Azure

Windows Azure has five main parts: Compute, Storage, the Fabric Controller, the Content Distribution Network (CDN), and Connect.

  • Compute: runs applications in the cloud. Those applications largely see a Windows Server environment, although the Windows Azure programming model isn’t exactly the same as the on-premises Windows Server model.
  • Storage: stores binary and structured data in the cloud.
  • Fabric Controller: deploys, manages, and monitors applications. The fabric controller also handles updates to system software throughout the platform.
  • Content Delivery Network (CDN): speeds up global access to binary data in Windows Azure storage by maintaining cached copies of that data around the world.
  • Connect: allows creating IP-level connections between on-premises computers and Windows Azure applications.

Source: Whitepaper: Introducing Windows Azure (V1) page 3 , DAVID CHAPPELL, October 2010; – more

Why the OCR Service will not be shut down for Maintenance

The ABBYY Cloud OCR service is build according to the Microsoft Windows Azure Architecture and Guidelines. So every instance has to run at least 2 times (controlled by the data centre infra-structure).

  • “Traditional maintenance” like in on one instance servers IT situations will not/never be made.

If an update/patch has to be made/executed, then new up-to-date instances will be started, once they are successfully running the existing “old” instances will be shut down. The service is designed not to rely on any status of any of the running instances, this is why all incoming/processed images/documents are stored in the redundant BLOB/mass storage. For the jobs and their status the redundant Azure SQL database service is used.

Service Monitoring & Management

There are several levels of monitoring the ABBYY OCR Service.

  • The Microsoft Azure fabric controller that monitors Azure roles and restarts them if they become irresponsive
  • ABBYY application level decentralized monitoring (i.e. without single point of failure) that watches service health and automatically scales up/down roles fleet based on queue state/length/shape.
    It also notifies the ABBYY team when certain behaviors seem to be/look suspicious, for example service abuse and DDoS attacks.
  • “Traditional monitoring system” that regularly runs basic recognition and service management scenarios and measures success/no success and also time taken. It notifies the ABBYY team in case of abnormal behavior.

Related Info