Detect text from the screen and click on it in Python programming

In this tutorial, I will show you how to detect text on the screen and then click on it with the help of Python programming. For example, you may want to click on an anchor link on a website and detect that link by its text. So, let’s see how to perform it.

To detect text and click on it, we can use these Python modules: pyautogui, tesseract, pillow, and the time module.

So let’s first import those modules:

import pyautogui
import pytesseract
from PIL import ImageGrab
import time

Now let the program hold for 5 seconds so that we can be ready in this time to see the effect of our Python program:

time.sleep(5)

Now let’s capture the entire screen and define the text we want to find on the screen:

# Capture the entire screen or a region
screenshot = ImageGrab.grab()

# Convert the image to string using pytesseract
text_data = pytesseract.image_to_data(screenshot, output_type=pytesseract.Output.DICT)

# The text you want to detect
target_text = "Click me"

Finally, we have to loop through all the available texts on the screen to find the text we want to click on:

# Loop through all detected text to find coordinates of the target text
for i, text in enumerate(text_data['text']):
    if target_text.lower() in text.lower():
        x = text_data['left'][i]
        y = text_data['top'][i]
        width = text_data['width'][i]
        height = text_data['height'][i]
        
        # Calculate the center of the text
        center_x = x + width // 2
        center_y = y + height // 2

        # Move the mouse to the center of the text and click
        pyautogui.moveTo(center_x, center_y)
        pyautogui.click()
        print(f"Clicked on text '{target_text}' at ({center_x}, {center_y})")
        break
else:
    print(f"Text '{target_text}' not found on the screen.")

In the above program, I have got the x coordinate and y coordinate of the target text and getting the center of the text location by this code:

# Calculate the center of the text

center_x=x+width//2

center_y=y+height//2

We use the moveTo() method from pyautogui module to move our mouse to the center of the text position and then perform the click using pyautogui.click() method. That’s it… We did it. Now we can click on the text available on the screen programmatically.

Complete and final code in one place

Below is the complete and final working code for you:

import pyautogui
import pytesseract
from PIL import ImageGrab
import time

# Set the path to tesseract.exe if it's not in your PATH environment variable
# Example: pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
time.sleep(5)

# Capture the entire screen or a region
screenshot = ImageGrab.grab()

# Convert the image to string using pytesseract
text_data = pytesseract.image_to_data(screenshot, output_type=pytesseract.Output.DICT)

# The text you want to detect
target_text = "Click me"

# Loop through all detected text to find coordinates of the target text
for i, text in enumerate(text_data['text']):
    if target_text.lower() in text.lower():
        x = text_data['left'][i]
        y = text_data['top'][i]
        width = text_data['width'][i]
        height = text_data['height'][i]
        
        # Calculate the center of the text
        center_x = x + width // 2
        center_y = y + height // 2

        # Move the mouse to the center of the text and click
        pyautogui.moveTo(center_x, center_y)
        pyautogui.click()
        print(f"Clicked on text '{target_text}' at ({center_x}, {center_y})")
        break
else:
    print(f"Text '{target_text}' not found on the screen.")

 

Leave a Reply

Your email address will not be published. Required fields are marked *