Extract text from image in Python
In this tutorial, we are going to describe one of the most interesting things in python that is how to extract text from the image in python. We are going to do this by using two modules that is cv2 and pytesseract. So you have to install cv2 and pytesseract in your machine.
Installation of cv2 and pytesseract
You have to download the latest version of tesseract and OpenCV and install it in your pc as you install normal software.
How to extract text from image in Python
At first, we will import pytesseract as tr and cv2
import pytesseract as tr import cv2
Next, we will declare variable I am to read the image and we will read the image by this function imread. And in brackets, we will give the location of the image which we will want to import but if it is already present in the folder then we will type only the name of the image.
im = cv2.imread('image.jpg')
Then we will declare another variable string_from_image to store the string which is read from the image. And we’ll apply the image_to_string function to read the text. As an argument of the function, we’ll use the ‘im’ variable.
string_from_image = tr.image_to_string(im)
And the final step is to print the string
The whole code for the above explanation is
import pytesseract as tr import cv2 im = cv2.imread('image.jpg') string_from_image = tr.image_to_string(im) print (string_from_image)
But in this whole program, you might have to face some difficulties like – you have installed the required packages but your system is showing that you have not installed the package yet.
To fix this issue you have to write the following code in your Python IDE
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
For windows pc you can also apply this method:
This pc (My Computer) -> properties -> Advanced system settings -> Environment variable ->PATH -> New-> C:/Program Files /Tesseract-OCR/