Detect text on an image in Python
Ever had a need to copy text from an image and found yourself re-typing it? In this tutorial, I will show you how to extract text from images automatically using Python and Optical Character Recognition. Perfect for digitizing documents, automating data entry, or processing scanned materials.
What We’ll Build
We will write a Python script that can:
– Read any image file
– Process it to enhance text recognition
– Detect areas with text
– Extract the text and save it to a file
Prerequisites
Before we start, you'll need to install: 1. Python (3.6 or later) 2. OpenCV (`pip install opencv-python`) 3. Tesseract OCR engine and its Python wrapper (`pip install pytesseract`) 4. The Tesseract executable (can be downloaded from the official GitHub repository)
Understanding the Code Step by Step
1. Setting Up the Environment
import cv2 import pytesseract import os
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" os.environ["TESSDATA_PREFIX"] = r"C:\Program Files\Tesseract-OCR\tessdata"
This section imports the necessary libraries and tells our script where to find the Tesseract OCR engine on your computer. Think of Tesseract as our “eyes” that will read the text from images.
2. Image Preprocessing
image = cv2.imread("sample2.jpg") grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) _, binary_image = cv2.threshold(grayscale_image, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
Just like how it’s easier to read black text on white paper than colored text on a colored background, we prep the image by:
– Turning it into grayscale (black and white)
– Applying something known as “Otsu’s thresholding” to make the text clearly visible from the background
3. Finding Text Regions
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18)) dilated_image = cv2.dilate(binary_image, kernel, iterations=1) contours, _ = cv2.findContours(dilated_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
This section is equivalent to using a highlighter to mark where text appears in the image. We:
– “Dilate” the image (make text thicker) to help connect nearby letters
– Detect “contours” (outlines) around text areas
4. Extraction and Saving of Text
for contour in contours: x, y, width, height = cv2.boundingRect(contour) cropped_region = image[y:y + height, x:x + width] extracted_text = pytesseract.image_to_string(cropped_region) with open(output_file, "a") as file: file.write(extracted_text) file.write("\n")
For each text region we found, we:
– Put a rectangle around it
– Crop only that section of the picture
– Use Tesseract to read the text
– Save the extracted text to a file
Typical Application Scenarios
This script comes in handy for:
– Digitizing hard-copy documents
– Text extraction from screenshots
– Processing scanned business cards
– OCR: Image-based PDFs to searchable text
Tips for Better Results
1. Image Quality Matters: Clear, high-resolution images work best 2. Proper Lighting: Well-lit images with good contrast between text and background 3. Clean Images: Remove watermarks or extra graphics if possible 4. Font Considerations: Standard fonts are recognized more accurately than decorative ones
Conclusion
It becomes very easy to extract text from images using a few lines of Python code. While the script we have developed is pretty simple, it really showcases the power of combining OpenCV for image processing and Tesseract for recognizing text.
Leave a Reply