Thompson Sampling for Multi-Armed Bandit Problem in Python

Post Views: 1,088

We’re going to look at different ways that we can solve the multi-armed bandit problem in Python. Let us first understand what is a multi-armed Bandit. A one-armed bandit is a slot machine. Back in the olden days, the machine used to have a handle(lever) on the right and we had to pull the lever to make it work. A multi-armed bandit problem is kind of the challenge that a person is faced when he comes up to a whole set of these machines. Suppose you’ve got seven of these machines. You have decided to play a thousand times. How do you figure out which ones of them to play to maximize your returns?

Thompson sampling for a multi-armed bandit problem

Let us study a modern-day application of Thomson Sampling to optimize the click-through rate of an advertisement.

Outline for the task is as follows:

Similarly, we have ten versions of the same add trying to sell a Lenovo mobile.

Each time a user of the social network will log into his account we will place one version of these 10 ads.
Importantly, we will observe the user’s response. If the user clicks on the ad, we get a bonus equal to 1. Else, we get a zero.
We use Thompson sampling, a probabilistic algorithm to optimize the click-through rate of an advertisement.

Prerequisites for implementing the code:

You must have a Spyder(Python 3.7) or any other latest version software installed.
You need to have a dataset file, which is generally an ms-excel file, with a .csv extension.
Set the folder as a working directory, in which your dataset is stored.
You need to know the Python programming language.

Step by step implementation of the code:

1.Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

2.Importing the dataset

The dataset consists of 8 columns, each corresponding to a model number. There is ‘N’ number of rows, which contain the value ‘1’ or ‘0’, for each column.

dataset = pd.read_csv('Adds_analysis.csv')

3.Implementing Thomson Sampling Algorithm in Python

First of all, we need to import a library ‘beta’. We initialize ‘m’, which is the number of models and ‘N’, which is the total number of users.

At each round, we need to consider two numbers. The first number is the number of times the ad ‘i’ got a bonus ‘1’ up to ‘ n’ rounds and the second number is the number of times ad ‘i’ got a bonus ‘0’ up to round ‘n’.

We will consider these parameters and declare the variables corresponding to these parameters as

‘bonus_equal_to_1 ‘ and ‘bonus_equal_to_0’. Importantly, these are declared as vectors of ‘m’ elements. (It is initialized as a vector of ‘m’ zeroes)

import random
N = 9000
m = 8
model_selected = []
bonus_equal_to_1 = [0] * m
bonus_equal_to_0 = [0] * m
total_bonus = 0

For each ad ‘i’, we take a random draw from the distribution called the beta distribution, shown below :

fi(n)=β(bonus_equal_to_1 [i]+1, bonus_equal_to_0[i]+1)

This is based on Bayesian inference and Bernoulli’s trial functions.
We select the model that has the highest fi(n) value.
Furthermore, we are going to use a function of python, which is the random.betavariate, which will give us some random draws of the beta distribution of parameters that we choose. (here, the parameters chosen are- bonus_equal_to_1 [i]+1, bonus_equal_to_0[i]+1)

We’re taking a random draw from the distribution of parameters and we checked to see if this random draw is higher than the ‘max_count‘.

Therefore, if this new random draw is higher than the max_count here; that means that this condition is true and therefore ‘max_count’ takes the value of this new random draw.

for n in range(0, N):
    model = 0
    max_count = 0
    for i in range(0, m):
        random_beta = random.betavariate(bonus_equal_to_1[i] + 1, bonus_equal_to_0[i] + 1)
        if random_beta > max_count:
            max_count = random_beta
            model = i
    model_selected.append(model)
    bonus = dataset.values[n, model]

Above all, we need to update the index of the variables, each time we get a bonus because these were initialized to zero, initially.

if bonus == 1:
        bonus_equal_to_1[model] = bonus_equal_to_1[model] + 1
    else:
        bonus_equal_to_0[model] =  bonus_equal_to_0[model] + 1
    total_bonus = total_bonus + bonus

4.Plotting a histogram

plt.hist(model_selected)
plt.title('Histogram for the most liked ad')
plt.xlabel('model number of ads')
plt.ylabel('Number of times each ad was selected')
plt.show()

Complete code:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Adds_analysis.csv')

# Implementing Thompson Sampling algorithm
import random
N = 9000
m = 8
model_selected = []
bonus_equal_to_1 = [0] * m
bonus_equal_to_0 = [0] * m
total_bonus = 0
for n in range(0, N):
    model = 0
    max_count = 0
    for i in range(0, m):
        random_beta = random.betavariate(bonus_equal_to_1[i] + 1, bonus_equal_to_0[i] + 1)
        if random_beta > max_count:
            max_count = random_beta
            model = i
    model_selected.append(model)
    bonus = dataset.values[n, model]
    if bonus == 1:
        bonus_equal_to_1[model] = bonus_equal_to_1[model] + 1
    else:
        bonus_equal_to_0[model] =  bonus_equal_to_0[model] + 1
    total_bonus = total_bonus + bonus

# Plotting a  Histogram
plt.hist(model_selected)
plt.title('Histogram for the most liked ad')
plt.xlabel('model number of ads')
plt.ylabel('Number of times each ad was selected')
plt.show()

Results:

As a result, the histogram shows the most preferred model. We can also check the results in the variable explorer of Spyder.

Also read: How to find the missing number in Geometric Progression in Python