Eren Akbulut's Blog

What is OCR?

January 21st, 2021

Hello everyone, today I'll be talking about what is OCR and what are the popular use cases of OCR. It won't be a tutorial type of post I'll just briefly explain some concepts and application fields overall.

What is OCR?

OCR stands for optical character recognition. OCR is a widely used technology to recognise text out of pictures, photos or scanned documents. OCR itself is a really though and highly researched field for computer scientist for long long years, yet with the results of this work for such long years both developers and people that are not involved with any development can use many OCR tools sometimes without even knowing.

For example one of the most popular and widely used OCR engines Tesseract is being developed since 1985 and Hewlett-Packard Co was the original creator of the project, current maintainer of the project is Google and they are doing it since 2006. We can safely say the thing that makes the project that main stream is the involvement of the Google after 2006.

Tesseract engine itself has many many implementations for different programming languages, actually basically all widely used programing languages has some sort of a port of Tesseract nowadays. You can check it here.

Other than Tesseract many cloud providers are offering OCR APIs under their roof but on the edge computing Tesseract and the OpenCV are the ones that carries the most load for the OCR businesses. OpenCV is a much widely used and more generic engine overall, so I might make another post about it to cover some of its mostly used features with some examples.

Popular Applications

Creating digital copies of hard printed documents out of scanned samples is one of the most popular uses of OCR, before the evolution I mention above happened only option to accomplish such a task was to write it again manually.
Creating digital copies of the similar documents on the edge devices like mobile phones, many new generation mobile device now is eligible to take clear enough photos to run OCR on them, of course, with the improved computing power they have compared to recent years.
OCR is very flexible to use that in many applications OCR techniques is being used to create filters for many purposes. Extracting text from a image allows many text based Machine Learning models to work on them. With that hybrid applications people can create filters that catches harassment, abuse, offensive language and so on and so forth even though they were hidden in images.
Using OCR to translate is also quite mainstream at the moment. Many people are using OCR technologies to build multi-language image based text translator on both mobile and web platforms.

Many generic fields that I talked about above are several use cases in several fields from law to healthcare to insurance. I'll not cover all the topics because that's not the point of the post. I'll however try to create tutorials about OCR that applies to many fields.

I hope to see you on the next one, take care :)