Detect text from the screen and click on it in Python programming
In this tutorial, I will show you how to detect text on the screen and then click on it with the help of Python programming. For example, you may want to click on an anchor link on a website and detect that link by its text. So, let’s see how to perform it.
To detect text and click on it, we can use these Python modules: pyautogui, tesseract, pillow, and the time module.
So let’s first import those modules:
import pyautogui import pytesseract from PIL import ImageGrab import time
Now let the program hold for 5 seconds so that we can be ready in this time to see the effect of our Python program:
time.sleep(5)
Now let’s capture the entire screen and define the text we want to find on the screen:
# Capture the entire screen or a region screenshot = ImageGrab.grab() # Convert the image to string using pytesseract text_data = pytesseract.image_to_data(screenshot, output_type=pytesseract.Output.DICT) # The text you want to detect target_text = "Click me"
Finally, we have to loop through all the available texts on the screen to find the text we want to click on:
# Loop through all detected text to find coordinates of the target text for i, text in enumerate(text_data['text']): if target_text.lower() in text.lower(): x = text_data['left'][i] y = text_data['top'][i] width = text_data['width'][i] height = text_data['height'][i] # Calculate the center of the text center_x = x + width // 2 center_y = y + height // 2 # Move the mouse to the center of the text and click pyautogui.moveTo(center_x, center_y) pyautogui.click() print(f"Clicked on text '{target_text}' at ({center_x}, {center_y})") break else: print(f"Text '{target_text}' not found on the screen.")
In the above program, I have got the x coordinate and y coordinate of the target text and getting the center of the text location by this code:
# Calculate the center of the text center_x=x+width//2 center_y=y+height//2
moveTo()
method from pyautogui module to move our mouse to the center of the text position and then perform the click using pyautogui.click()
method. That’s it… We did it. Now we can click on the text available on the screen programmatically.Complete and final code in one place
Below is the complete and final working code for you:
import pyautogui import pytesseract from PIL import ImageGrab import time # Set the path to tesseract.exe if it's not in your PATH environment variable # Example: pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' time.sleep(5) # Capture the entire screen or a region screenshot = ImageGrab.grab() # Convert the image to string using pytesseract text_data = pytesseract.image_to_data(screenshot, output_type=pytesseract.Output.DICT) # The text you want to detect target_text = "Click me" # Loop through all detected text to find coordinates of the target text for i, text in enumerate(text_data['text']): if target_text.lower() in text.lower(): x = text_data['left'][i] y = text_data['top'][i] width = text_data['width'][i] height = text_data['height'][i] # Calculate the center of the text center_x = x + width // 2 center_y = y + height // 2 # Move the mouse to the center of the text and click pyautogui.moveTo(center_x, center_y) pyautogui.click() print(f"Clicked on text '{target_text}' at ({center_x}, {center_y})") break else: print(f"Text '{target_text}' not found on the screen.")
Leave a Reply