Detect text on an image in Python

Ever had a need to copy text from an image and found yourself re-typing it? In this tutorial, I will show you how to extract text from images automatically using Python and Optical Character Recognition. Perfect for digitizing documents, automating data entry, or processing scanned materials.

What We’ll Build

We will write a Python script that can:
– Read any image file
– Process it to enhance text recognition
– Detect areas with text
– Extract the text and save it to a file

Prerequisites

Before we start, you'll need to install:
1. Python (3.6 or later)
2. OpenCV (`pip install opencv-python`)
3. Tesseract OCR engine and its Python wrapper (`pip install pytesseract`)
4. The Tesseract executable (can be downloaded from the official GitHub repository)

 Understanding the Code Step by Step

 1. Setting Up the Environment

import cv2
import pytesseract
import os
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
os.environ["TESSDATA_PREFIX"] = r"C:\Program Files\Tesseract-OCR\tessdata"

This section imports the necessary libraries and tells our script where to find the Tesseract OCR engine on your computer. Think of Tesseract as our “eyes” that will read the text from images.

2. Image Preprocessing

image = cv2.imread("sample2.jpg")
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_, binary_image = cv2.threshold(grayscale_image, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)

Just like how it’s easier to read black text on white paper than colored text on a colored background, we prep the image by:
– Turning it into grayscale (black and white)
– Applying something known as “Otsu’s thresholding” to make the text clearly visible from the background

3. Finding Text Regions

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18))
dilated_image = cv2.dilate(binary_image, kernel, iterations=1)
contours, _ = cv2.findContours(dilated_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

This section is equivalent to using a highlighter to mark where text appears in the image. We:
– “Dilate” the image (make text thicker) to help connect nearby letters
– Detect “contours” (outlines) around text areas

4. Extraction and Saving of Text

for contour in contours:
x, y, width, height = cv2.boundingRect(contour)
cropped_region = image[y:y + height, x:x + width]
extracted_text = pytesseract.image_to_string(cropped_region)
with open(output_file, "a") as file:
file.write(extracted_text)
file.write("\n")

For each text region we found, we:
– Put a rectangle around it
– Crop only that section of the picture
– Use Tesseract to read the text
– Save the extracted text to a file

Typical Application Scenarios

This script comes in handy for:
– Digitizing hard-copy documents
– Text extraction from screenshots
– Processing scanned business cards
– OCR: Image-based PDFs to searchable text

Tips for Better Results

1. Image Quality Matters: Clear, high-resolution images work best
2. Proper Lighting: Well-lit images with good contrast between text and background
3. Clean Images: Remove watermarks or extra graphics if possible
4. Font Considerations: Standard fonts are recognized more accurately than decorative ones

Conclusion

It becomes very easy to extract text from images using a few lines of Python code. While the script we have developed is pretty simple, it really showcases the power of combining OpenCV for image processing and Tesseract for recognizing text.

 

Leave a Reply

Your email address will not be published. Required fields are marked *