Identifying Product Bundles from Sales Data Using Python Machine Learning

In this article, we are going to observe product bundles from sales data using machine learning technique in Python language. Product bundles are a combination of items to increase the sales of a shop or a company. So, in order to identify product bundles, we use market basket analysis which is one of the key techniques to increase sales for a shop.

Market basket analysis is about “People who bought something, also bought something else.”

Market basked analysis to identify product bundles

We use an algorithm called the apriori algorithm which will be used to derive these associations. Moreover, it is a popular algorithm for generating frequent item-sets. Three essential parts of this algorithm are Support, Confidence, and Lift.

Support(I) = # Transactions containing I / # Transactions
Confidence(I1 -> I2) = # Transactions containing I1 and I2 / # Transactions containing I2
Lift(I1 -> I2) = Confidence(I1 -> I2) / Support(I2)

Let’s see the step-wise implementation of the apriori algorithm.:

1: Set up minimum support and confidence

2: Take all the subsets in transactions having higher support than minimum support

3: Take all the rules of these subsets having higher confidence than minimum confidence

4: Sort the rules by decreasing lift

Implementation of Apriori in Python

To use this algorithm in python, we need to install apyori package. Firstly, by executing the following line of code in terminal or command prompt

pip install apyori

Now let’s import our libraries and dataset. Also, we will create a transactions list of our dataset.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
  transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

We use apriori() function of apyori package with parameters:

  • transactions = our dataset in the form of a list
  • min_support = minimum support we wish to consider. Here we use 0.003
  • min_confidence = minimum confidence we wish to consider. Here we use 0.2
  • min_lift = minimum lift we wish to consider. Here we use 3.
  • min_length = minimum # of associated rules we require in our output. Here we use only 2.
  • max_length = maximum # of associated rules we require in our output. Here we use only 2.
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

Now we visualize the results obtained. Where we will get the information of two products frequently purchased together.

results = list(rules)
print(results)

Output:

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0.2450980392156863, lift=5.164270764485569)]),
RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}), support=0.015997866951073192, ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}), confidence=0.3234501347708895, lift=3.2919938411349285)]),
RelationRecord(items=frozenset({'ground beef', 'tomato sauce'}), support=0.005332622317024397, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomato sauce'}), items_add=frozenset({'ground beef'}), confidence=0.3773584905660377, lift=3.840659481324083)]),
RelationRecord(items=frozenset({'olive oil', 'light cream'}), support=0.003199573390214638, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'olive oil'}), confidence=0.20512820512820515, lift=3.1147098515519573)]),
RelationRecord(items=frozenset({'olive oil', 'whole wheat pasta'}), support=0.007998933475536596, ordered_statistics=[OrderedStatistic(items_base=frozenset({'whole wheat pasta'}), items_add=frozenset({'olive oil'}), confidence=0.2714932126696833, lift=4.122410097642296)]),
RelationRecord(items=frozenset({'shrimp', 'pasta'}), support=0.005065991201173177, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'shrimp'}), confidence=0.3220338983050847, lift=4.506672147735896)])]

Further reading:

Leave a Reply

Your email address will not be published. Required fields are marked *