Ball Catcher Game in Python
It is not a difficult job to combine the concepts of game development with reinforcement learning and make a program play the game on its own. In this article, we are going to develop a simple ball catcher game in python using the concepts of reinforcement learning to make our program “intelligent”. But before that, make sure you understand the basics of Reinforcement Learning, and more specifically, Q Learning.
In our game, there is going to be a ball that is going to drop continuously from top to bottom. Then a rectangular catcher is going to catch the dropping ball. If it succeeds, we score a point, or else, we miss a point. There are four parts to this article, and in the end, you’re going to have an agent play a ball catcher game for you. Also, make sure you have the following to libraries installed:
- Pygame
- NumPy
Step 1: Initializing classes
We start by initializing the Circle class for our ball and State class to define each state of catcher and ball.
class Circle: def __init__(self, circleX, circleY): self.circleX = circleX self.circleY = circleY # X and Y coordinates of circle with respect to the window class State: def __init__(self, rect, circle): self.rect = rect self.circle = circle # States of rectangle (catcher) and circle (ball)
Step 2: Initializing window, ball, and catcher
We define the shapes of the window and RGB color schemes in the window.
import numpy as np windowWidth = 800 windowHeight = 400 RED = (255, 0, 0) GREEN = (0, 255, 0) WHITE = (255, 255, 255) BLACK = (0, 0, 0)
Similarly, we initialize sizes of ball, catcher and how fast the ball is going to fall from the top
# Initial position of Ball with respect to window crclCentreX = 400 crclCentreY = 50 crclRadius = 20 crclYStepFalling = windowHeight / 10 # 40 pixels each time # Initial position of Catcher with respect to window rctLeft = 400 rctTop = 350 rctWidth = 200 rctHeight = 50
We initialize the Q-learning table and use a dictionary to access the index of the table. The Q-learning table consists of state-action pairs of the game.
QIDic = {} # number of states = (windowWidth / 8) * (windowHeight / crclYStep) * (windowWidth / rectWidth) Q = np.zeros([5000, 3])
Step 3: Defining functions for each case of the ball catcher game
Firstly, we change the state of the game after each required action. This means, a new state calls for new positions of ball and catcher. We use Rect class of pygame to define the state of catcher (Rectangle). The arguments to the function are state and action objects.
import pygame as pg def new_state_after_action(s, act): rct = None if act == 2: # 0 == stay, 1 == left, 2 == rctHeight if s.rect.right + s.rect.width > windowWidth: rct = s.rect else: rct = pg.Rect(s.rect.left + s.rect.width, s.rect.top, s.rect.width, s.rect.height) # Rect(left, top, width, height) elif act == 1: # action is left if s.rect.left - s.rect.width < 0: rct = s.rect else: rct = pg.Rect(s.rect.left - s.rect.width, s.rect.top, s.rect.width, s.rect.height) #Rect(left, top, width, height) else: #action is 0, means stay where it is rct = s.rect newCircle = Circle(s.circle.circleX, s.circle.circleY + crclYStepFalling) return State(rct, newCircle)
We define another function to make catcher follow the constraints of the window. Arguments we use are rectangle and action objects.
def new_rect_after_action(rect, act): if act == 2: if rect.right + rect.width > windowWidth: return rect else: return pg.Rect(rect.left + rect.width, rect.top, rect.width, rect.height) elif act == 1: if rect.left - rect.width < 0: return rect else: return pg.Rect(rect.left - rect.width, rect.top, rect.width, rect.height) else: return rect
The next functions are:
- circle_falling(circle_radius) – To randomly initialize the x axis position of ball after each fall
- calculate_score(rectangle, circle) – To keep the score tally of the agent
- state_to_number(state) – To add values of state objects in integer in QIDic
- get_best_action(state) – To retrieve the best action for the agent
import random def circle_falling(crclRadius): newx = 100 - crclRadius multiplier = random.randint(1, 8) newx *= multiplier return newx def calculate_score(rect, circle): if rect.left <= circle.circleX <= rect.right: return 1 else: return -1 def state_to_number(s): r = s.rect.left c = s.circle.circleY # print(r, c, s.circle.circleX ) n = (str(r) + str(c) + str(s.circle.circleX)) if n in QIDic: return QIDic[n] else: if len(QIDic): maximum = max(QIDic, key=QIDic.get) QIDic[n] = QIDic[maximum] + 1 else: QIDic[n] = 1 return QIDic[n] def get_best_action(s): return np.argmax(Q[state_to_number(s), :])
Step 4: Let’s set up the learning rate of our agent and play the game!
Let’s initialize our “pygame” and set FPS, window and rectangle objects.
import sys from pygame.locals import * # Initializing frames per second FPS = 20 fpsClock = pg.time.Clock() # Initializing the game pg.init() # Window and Rectangle objects window = pg.display.set_mode((windowWidth, windowHeight)) pg.display.set_caption("Catch the Ball") rct = pg.Rect(rctLeft, rctTop, rctWidth, rctHeight)
Some variables that we’re going to use in our logic and the learning rate. Try tuning the learning rate to understand the algorithm’s behavior.
# Initialzing variables and learning rates action = 1 score, missed, reward = 0, 0, 0 font = pg.font.Font(None, 30) lr = .93 y = .99 i = 0
Finally, let’s teach the agent some rules of the game and check its performance. We provide the conditions for reward, the Q-learning algorithm and finally, the scores.
# Executing the game rules and Q-Learning logic while True: for event in pg.event.get(): if event.type == QUIT: pg.quit() sys.exit() window.fill(BLACK) #at this position, the rectangle should be here if crclCentreY >= windowHeight - rctHeight - crclRadius: reward = calculate_score(rct, Circle(crclCentreX, crclCentreY)) # +1 or -1 crclCentreY = 50 crclCentreX = circle_falling(crclRadius) else: reward = 0 crclCentreY += crclYStepFalling # crclCentreX += circle_falling(crclRadius) s = State(rct, Circle(crclCentreX, crclCentreY)) act = get_best_action(s) r0 = calculate_score(s.rect, s.circle) s1 = new_state_after_action(s, act) Q[state_to_number(s), act] += lr * (r0 + y * np.max(Q[state_to_number(s1), :]) - Q[state_to_number(s), act]) rct = new_rect_after_action(s.rect, act) crclCentreX = s.circle.circleX crclCentreY = int(s.circle.circleY) pg.draw.circle(window, RED, (crclCentreX, crclCentreY), int(crclRadius)) pg.draw.rect(window, GREEN, rct) if reward == 1: score += reward elif reward == -1: missed += reward text = font.render("Score: " + str(score), True, (238, 58, 140)) text1 = font.render("Missed: " + str(missed), True, (238, 58, 140)) window.blit(text, (windowWidth - 120, 10)) window.blit(text1, (windowWidth - 280, 10)) pg.display.update() fpsClock.tick(FPS) if i == 10000: break else: i += 1
Here’s what your output could look like:
Q-learning is a powerful algorithm to make the agent intelligent. Furthermore, reinforcement learning algorithms are heavily used in robotics.
If you liked the article, you might like:
- Argmax function used in Machine Learning in Python
- Building bot for playing google chrome dinosaur game in Python
- ML | VGG-16 implementation in Keras
If you find any difficulties in following the article, do let us know in the comments.
Leave a Reply