Predict Next Purchase using Machine Learning in Python

In this tutorial, we will learn how to Predict the Next Purchase using Machine Learning in Python programming language. Basically, by next purchase here we mean that number of items required in the coming month to sell. Every item has its unique ID number.

There are many datasets available online which we can use in our study. But specifically for this article, we will take Kaggle dataset-https://www.kaggle.com/c/competitive-data-science-predict-future-sales/overview. It has details of items with their prices and dates on which it is sold in a shop with shop ID also given. And here we will predict the overall count required of a particular item next month.

There are many learning algorithms for predicting the next purchase. But for this, tutorial we will use the RNN LSTM model to get a good accuracy score for our purchase.

Used Libraries for this program:

  • Pandas
  • Numpy
  • Matplotlib
  • Seaborn

Python program to Predict Next Purchase using Machine Learning

We will use the Jupyter notebook for making our model.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

We will start with importing libraries

default_path = '../input/'
!ls ../input
train_df = pd.read_csv(default_path+'sales_train.csv')
items_df = pd.read_csv(default_path+'items.csv')
test_df = pd.read_csv(default_path+'test.csv')

Then we will upload the necessary CSV files using the pandas library.

train_df['date'] = pd.to_datetime(train_df['date'], format='%d.%m.%Y')

This will convert the argument i.e. string to DateTime format.

dataset = train_df.pivot_table(index=['item_id', 'shop_id'],values=['item_cnt_day'], columns='date_block_num', fill_value=0)
dataset = dataset.reset_index()
dataset = pd.merge(test_df, dataset, on=['item_id', 'shop_id'], how='left')
dataset = dataset.fillna(0)
dataset = dataset.drop(['shop_id', 'item_id', 'ID'], axis=1)
X_train = np.expand_dims(dataset.values[:, :-1], axis=2)
y_train = dataset.values[:, -1:]

X_test = np.expand_dims(dataset.values[:, 1:], axis=2)
y_test = dataset.values[:, :1]

This will align the data in the required form in a table which we will import use in our model.

from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout

Then we will import our open-source Keras library which has.

model = Sequential()
model.add(LSTM(units=64, input_shape=(33, 1)))
model.add(Dropout(0.3))
model.add(Dense(1))

model.compile(loss='mse',
              optimizer='adam',
              metrics=['mean_squared_error'])

Now we made a model with the LSTM layer and which uses ‘adam’ optimizer.

history = model.fit(X_train, y_train, batch_size=4096, epochs=10)

Then we will fit the data in the model containing the LSTM layer.

Predict Next Purchase using Machine Learning in Python

plt.plot(history.history['loss'], label= 'loss(mse)')
plt.plot(np.sqrt(history.history['mean_squared_error']), label= 'rmse')
plt.legend(loc=1)

Now we will plot the model data using plt.plot().

Predict Next Purchase using Machine Learning in Python

LSTM_prediction = model.predict(X_test)
LSTM_prediction = LSTM_prediction.clip(0, 20)
submission = pd.DataFrame({'ID': test_df['ID'], 'item_cnt_month': LSTM_prediction.ravel()})
submission.to_csv('submission.csv',index=False)

In the end, we will input our test data to the model and will get predicted LSTM data and make the dataframe using pandas and convert that data to dataframe using .to_csv. We refer to the next purchase here as the number of items required in the coming month. You can provide any data which you want to predict to model and get the CSV output file as we have done in the test set.

OUTPUT FILE: submission.csv

CSV file image:

LSTM_prediction = model.predict(X_test)

This is the output file of our dataset. That’s it!

Leave a Reply

Your email address will not be published. Required fields are marked *