`OCR | Optical Character Recognition | Sicara

AI METHOD

Sicara and OCR

Sicara uses and implements OCR solutions

Sicara adapts to your business needs to deliver the solution you need and extract information from your documents. Depending on document type (either handwritten or printed), criticity of OCR in your project (central to the value, step to dataset creation,...) and delivery speed/security constraints, we leverage: - The use of specialized API (Microsoft Cognitive Services, Google Cloud Vision,...) - The implementation of custom solutions through our mastery in python libraries: Keras, PyTesseract and OpenCV

OCR - ENG Head
OCR magnifying glass 2

Optical Character Recognition (OCR) is a subdomain of Computer Vision, related to Pattern Recognition. This AI field corresponds to the rendering of physical documents into identified text. Such documents can contain handwritten and/or printed texts along with images. The main applications of OCR include documents digitization, information extraction (reading from official documents, license plates,...) or dataset creation for AI training.

2000

1st online OCR

1 870

1st OCR

2 005

Release of Tesseract

Some Figures

+22%
Documents printed every year
x10
Size of Scanned vs. Text Document
Background
Background
Quotes

Early versions needed to be trained with images of each character, and worked on one font at a time. Advanced systems capable of producing a high degree of recognition accuracy for most fonts are now common, and with support for a variety of digital image file format inputs.

Wikipedia logo

Wikipedia

How does it work?

Main steps in an OCR process

Main steps in OCR process

How does it work?

Main steps in an OCR process

1. Document Standardization (crop, rotate, format,...) 2. Text Detection 3. Text Interpretation 4. Text Intelligent Cleaning

Some Use Cases

Projects involving OCR

Mail sorting center

Some Use Cases

Projects involving OCR

OCR usages divide into 2 main categories. First, digitization is used for storage and future utilization purposes: indexing documents for search purposes, building datasets to feed Artificial Intelligence algorithms. The second category aims at replacing some tasks of a complete process with an OCR engine in order to improve productivity. Mail sorting is an illustration of such use. A mail sorting center can dispatch thousands of packages on a daily basis. From mail deposit to delivery, it will go through several routing steps, all based on the address it comes with. Automating the recognization process of addresses would improve both speed and quality in mail delivery. That is where OCR comes into action, enabling interpretation of written addresses.

Our OCR Experts

We have a Team of Experienced Computer Vision Specialists

startup, sicara, team, teamwork

Our OCR Experts

We have a Team of Experienced Computer Vision Specialists

As part of Computer Vision, our specialty, we develop OCR solutions.

Adil

Centrale Paris

Clément

Mines Paris, PhD

Félix

Polytechnique

Raphaël

ENSTA, Polytechnique


Articles associés écrits par les Data Scientists Sicara (En Anglais)

GAN with Keras: Application to Image Deblurring

A Generative Adversarial Networks tutorial applied to Image Deblurring with the Keras library.

How to start with Keras

Keras Tutorial: Content Based Image Retrieval Using a Denoising Autoencoder

How to find similar images thanks to Convolutional Denoising Autoencoder

TensorFlow, AI, Docker, GPU

Set up TensorFlow with Docker + GPU in Minutes

Why Docker is the best platform to use Tensorflow with a GPU.