Movie Recommendation System using Machine Learning in Python

In this tutorial program, we will learn about building movie recommendation systems using Machine Learning in Python.  So here I am going to discuss what are the basic steps of this machine learning problem and how to approach it.

The approach used Collaborative Filtering.

MACHINE LEARNING

  1. Machine learning is a part of Artificial intelligence with the help of which any system can learn and improve from existing real datasets to generate an accurate output.
  2. The machines are programmed in such a way that the program looks for patterns in the data to make various decisions in the future without human intervention.

Here is the link to the dataset used:

dataset for this project

Development and cleaning of Dataset

import pandas as pd
from scipy import sparse

ratings = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')
ratings = pd.merge(movies,ratings).drop(['genres','timestamp'],axis=1)
print(ratings.shape)
ratings.head()#drop columns not required
ratings.head()#only userid,movie_name,ratings
Output:
movieId
title
userId
rating
0
1
Toy Story (1995)
1
4.0
1
1
Toy Story (1995)
5
4.0
2
1
Toy Story (1995)
7
4.5
3
1
Toy Story (1995)
15
2.5
4
1
Toy Story (1995)
17
4.5
#do not consider movies with less than 10 ratings from users and fill rest nan with 0
userRatings = ratings.pivot_table(index=['userId'],columns=['title'],values='rating')
userRatings.head()
print("Before: ",userRatings.shape)
userRatings = userRatings.dropna(thresh=10, axis=1).fillna(0,axis=1)
#userRatings.fillna(0, inplace=True)
print("After: ",userRatings.shape)

Output:

Before: (610, 9719) After: (610, 2269)

Implementing the correlation

#algo to find item similarity used is correlation,adjust for means
corrMatrix = userRatings.corr(method='pearson')
corrMatrix.head(100)

Output:

title ‘burbs, The (1989) (500) Days of Summer (2009) 10 Cloverfield Lane (2016) 10 Things I Hate About You (1999) 10,000 BC (2008) 101 Dalmatians (1996) 101 Dalmatians (One Hundred and One Dalmatians) (1961) 12 Angry Men (1957) 12 Years a Slave (2013) 127 Hours (2010) Zack and Miri Make a Porno (2008) Zero Dark Thirty (2012) Zero Effect (1998) Zodiac (2007) Zombieland (2009) Zoolander (2001) Zootopia (2016) eXistenZ (1999) xXx (2002) ¡Three Amigos! (1986)
title
‘burbs, The (1989) 1.000000 0.063117 -0.023768 0.143482 0.011998 0.087931 0.224052 0.034223 0.009277 0.008331 0.017477 0.032470 0.134701 0.153158 0.101301 0.049897 0.003233 0.187953 0.062174 0.353194
(500) Days of Summer (2009) 0.063117 1.000000 0.142471 0.273989 0.193960 0.148903 0.142141 0.159756 0.135486 0.200135 0.374515 0.178655 0.068407 0.414585 0.355723 0.252226 0.216007 0.053614 0.241092 0.125905
10 Cloverfield Lane (2016) -0.023768 0.142471 1.000000 -0.005799 0.112396 0.006139 -0.016835 0.031704 -0.024275 0.272943 0.242663 0.099059 -0.023477 0.272347 0.241751 0.195054 0.319371 0.177846 0.096638 0.002733
10 Things I Hate About You (1999) 0.143482 0.273989 -0.005799 1.000000 0.244670 0.223481 0.211473 0.011784 0.091964 0.043383 0.243118 0.104858 0.132460 0.091853 0.158637 0.281934 0.050031 0.121029 0.130813 0.110612
10,000 BC (2008) 0.011998 0.193960 0.112396 0.244670 1.000000 0.234459 0.119132 0.059187 -0.025882 0.089328 0.260261 0.087592 0.094913 0.184521 0.242299 0.240231 0.094773 0.088045 0.203002 0.083518
Almost Famous (2000) 0.099554 0.209549 0.032088 0.296727 0.134434 0.118628 0.242958 0.079158 0.005092 0.051279 0.244619 0.085395 0.072505 0.221259 0.126008 0.362571 0.011577 0.208008 0.186599 0.147413
Along Came Polly (2004) 0.027287 0.282426 0.113213 0.193085 0.162678 0.180259 0.112928 0.121704 0.125792 0.124032 0.173133 0.160430 0.029076 0.189165 0.166278 0.309183 0.078468 -0.036498 0.231566 0.025928
Along Came a Spider (2001) 0.064762 -0.003205 0.016372 0.085365 -0.018241 0.080388 0.094016 -0.016678 0.079375 -0.028052 0.115347 0.093774 0.085286 0.150757 0.052144 0.174489 0.014189 0.025327 0.233244 0.043581
Amadeus (1984) 0.136013 0.084829 -0.055707 0.105783 -0.008620 0.055704 0.121697 0.244291 0.084634 0.047370 -0.010703 0.015008 0.173486 0.103232 0.062977 0.097432 -0.007432 0.132956 0.075753 0.136565
Amazing Spider-Man, The (2012) 0.083419 0.224961 0.149903 0.103802 0.278253 0.096137 0.152795 0.070514 0.121492 0.168369 0.350739 0.234351 0.089202 0.262828 0.409487 0.151747 0.373173 0.023512 0.192038 0.143658

100 rows × 2269 columns

The similarity function to find similar movies

def get_similar(movie_name,rating):
    similar_ratings = corrMatrix[movie_name]*(rating-2.5)
    similar_ratings = similar_ratings.sort_values(ascending=False)
    #print(type(similar_ratings))
    return similar_ratings
romantic_lover = [("(500) Days of Summer (2009)",5),("Alice in Wonderland (2010)",3),("Aliens (1986)",1),("2001: A Space Odyssey (1968)",2)]
similar_movies = pd.DataFrame()
for movie,rating in romantic_lover:
    similar_movies = similar_movies.append(get_similar(movie,rating),ignore_index = True)


similar_movies.sum().sort_values(ascending=False).head(20)

The final prediction:

Output:

(500) Days of Summer (2009)                      2.584556
Alice in Wonderland (2010)                       1.395229
Silver Linings Playbook (2012)                   1.254800
Yes Man (2008)                                   1.116264
Adventureland (2009)                             1.112235
Marley & Me (2008)                               1.108381
About Time (2013)                                1.102192
Crazy, Stupid, Love. (2011)                      1.088757
50/50 (2011)                                     1.086517
Help, The (2011)                                 1.075963
Up in the Air (2009)                             1.053037
Holiday, The (2006)                              1.034470
Friends with Benefits (2011)                     1.030875
Notebook, The (2004)                             1.025880
Easy A (2010)                                    1.015771
Secret Life of Walter Mitty, The (2013)          0.997979
Perks of Being a Wallflower, The (2012)          0.967425
Toy Story 3 (2010)                               0.963276
Ugly Truth, The (2009)                           0.959079
Harry Potter and the Half-Blood Prince (2009)    0.954180
dtype: float64

 

Leave a Reply

Your email address will not be published. Required fields are marked *