Making your own Recommender System
A detailed guide on what are Recommender Systems, followed by a hands-on tutorial where you learn to make your own movie Recommender System using Python.
The field of AI and Machine learning is very vast. It has empowered us at very different levels. From segregating our messages and emails into a spam or not to having self-driving cars, the world has advanced a lot.
Almost every big Tech giant makes use of AI and Machine learning in their services and products and some features of these services that they provide would not be possible without the improvements in the field of Artificial intelligence and Machine Learning. Successful companies like Amazon, Netflix, Youtube, Google, all of these companies have something in common. No, it’s not about having millions of dollars. Nope, it’s not having tough interview questions and processes either.
It’s their recommender system! All these companies have exceptionally good recommender systems which play a very big role in the revenue that these big companies generate. In this article, we will dive deep into understanding what these systems are and how they work.
What’s a Recommender System?
Recommender Systems, as the name suggests, is built to give you suggestions or recommendations. Let’s take the example of Amazon. Let’s say a person wishes to buy a new keyboard. They go to the Amazon website and search for keyboards and many results pop up. They click on one to see the features and once when they scroll to the bottom, there is this pop up that says, you might be interested in buying this or these products are purchased as a bundle and so on.
This is what is called a recommender system. It takes the words or entities that you have entered and compares it with the other products or things present on the website and gives you suggestions.
How do recommender systems help these companies?
The answer to this question is very straightforward. It’s all related to playing with the human brain. When we are purchasing something and we are recommended a product, we are tricked into buying it. For example, a person wants to shop for groceries online. They opt to buy a loaf of bread. While they are adding the item to their cart, the website suggests they buy butter or cheese as well.
Now, if you are very busy or just someone lazy about keeping track of the grocery, you would not bother to check if that item is already present in your house and you would go ahead and purchase it because why not? It’s not that expensive. This is how these companies make that extra money using these recommender systems.
The recommender systems help in increasing the purchase of such small products. They also help in the case of big products that might be a little expensive but in most cases, it is the small products that make the difference.
In companies like Netflix, they want to have users engaged in their platform for more time. After every movie or tv series that you watch, you are recommended new movies and shows. And because of this, people continue watching these movies and tv shows and keep renewing their subscription monthly or annually which helps companies like Netflix grow and expand. This here shows us the true power of recommender systems.
How do these Recommender systems work?
Recommender systems might seem like a very small thing on a website but in reality, recommender systems are very complex things, and to make a good one, you need to have good knowledge about your model as well as a good math background as these systems are math-heavy. But as far as the theory goes, we have two basic methods of filtering known as collaborative and content-based filtering.
To understand Collaborative filtering, we will consider that you have run out of grocery for your next breakfast. You decide to visit the nearby grocery store that you have been visiting for years now. You go in to purchase bread, butter, and cheese. While billing, the owner of the store notices that you only picked bread and butter so he recommends that you also get cheese because you usually get these three items together. Here, the owner of the store is our Recommender System.
Recommender Systems that are made based on collaborative filtering work in the same manner. They recommend items to the user based on their past purchases. There are two approaches to make a recommender system, one being a memory-based approach and another being model based approach.
Let’s first start by understanding Memory based approach. The above example was an example of a memory-based approach. This too is divided into user-user and item-item filtering. In user-user filtering, we group the users that share some common features and recommend them items based on these shared features.
In item-item sharing, we compare the attributes of the items and then recommend the user these items based on their features. A memory-based approach that we just took a look at, is a good and easy way to implement your recommender system but when that data becomes too big to handle, the scalability becomes a concern.
Next up we have our model-based approach. For this approach, we depend on Machine learning algorithms like KNN, SVD.
There are various algorithms that you can use to make your recommender system. In these clustering algorithms, we cluster the users based on common features such as the type of the product purchased or some other qualities.
The model-based approach is much complex but in terms of scalability, it’s the best and provides huge room to make improvements in the system by enhancing and improving our model.
Overall, Collaborative filtering is great but it also has its drawbacks. One of the main drawbacks is experienced at the very beginning of the model when we wish to recommend something but as the filtering is based on experience, we have no data about our consumers hence it’s not effective. If we have some sort of data that we can use, we are good to go but in most cases getting this data is not possible.
Hence the model suffers due to it as it might just recommend some random item which is not at all related to the purchase that the consumer is making and this is not what a company wants as it may lead to a loss. But, there is also something known as content-based filtering that most companies use and it is considered to be much more effective than collaborative filtering provided there is a good data set present.
Let’s take the example of Netflix. How does it recommend you so many movies and tv shows? It uses content-based filtering. At Netflix, they keep a track of your watch history, watch time, and other things like your renewal period. This data is fed into their systems where their algorithm works to find some common features between the movies or shows that you have watched and the data that they have been collecting for years. They try to map the genre, the actors present in the movie, whether or not that movie has won an award, how often you watch a movie that has just been released, the ratings of the movies that you watch and so many other features.
All of these features help them to come up with the best recommendations for you. Companies are very particular about their recommender systems as these help in retaining their huge consumer base and also helps in bringing new consumers every day. This content-based filtering does not have the drawback of not being able to recommend anything at the very start of a business that the collaborative filtering suffers from as the recommendations are based on the features of a particular thing and are not dependent on the consumer.
This type of system will take far less time to adapt to the features that are fed to it and would make good recommendations in a very short period as compared to the one that takes the collaborative filtering approach.
Making our own Mini Recommender System
Let’s try and apply the concepts that we just read about and see if we can make our mini Recommender system. We will be working on the Movie lens data set. It’s a very big data set so we will try and use a small bit of it. Let’s begin by importing our data set and taking a look at it.
We have successfully imported our user data. We have the ID of the user who gave the rating to the movie, the movie ID, and the timestamp of the movie where it was rated. Let’s get the Titles for the respective title IDs.
This is what our data looks like after merging the user and the movie details.
We will now take a look at the movie titles with the most number of ratings and the movies with the highest ratings present in our data.
This is the result of the highest-rated movie present in our data.
This is the result of the most number of ratings that a movie has in our data. Something seems off here. We observe that some of the top-rated movies are the ones that we haven’t even heard of. It might have happened that one or two people saw these movies and gave the movie a 5-star rating and hence these movies are highly rated. We will later filter these sorts of ratings.
Next, let’s take the mean ratings for the Movies and the number of users that rated a particular movie.
This is how our Rating data frame looks like. This will help us later in our process. For now, let’s visualize our data and see what meaningful results we can gather.
In the above graph, we checked the distribution of our user ratings. As we can see, most movies have very few ratings as people tend to watch only the famous movies, and hence there is this discrepancy in the number of ratings. Let’s see how the ratings are distributed given by the users.
As we can see, most of our ratings are in the middle while we also have a few 5-star ratings and a few 1-star ratings as well. Again, this might have occurred because someone saw these movies once and decided to rate it either 1 star or 5 stars, and no one else rated these movies. Let’s quickly create a final joint plot to see the distribution of the ratings versus the number of ratings.
Now that we have a good idea about the relation between our user ratings and the number of ratings, we can go ahead and start working towards making recommendations.
We will create a pivot table that consists of user ID, Movie Titles, and user ratings.
Well, that’s a lot of Nan values but if you look closely, the table contains the ratings corresponding to the user ID so it’s not necessary that every user has watched all the movies present in our data and would give ratings to all these movies, hence we have these Nan Values. We will deal with them shortly.
For now, let’s try and recommend movies close to Star wars and Liar Liar.
Let’s compute the correlation between the rows and columns of the pivot table that we created. Once we have computed the correlation we will put these in a Data frame for the sake of understanding it in a better way and it is comparatively easier to work on the data frame.
This is what our data frame looks like. The high the positive correlation a movie has, the more it is similar to Star Wars. So let’s sort our results to see the movies that are closely related to Star Wars.
We can see some movies that have a perfect correlation with Star wars but you might have never heard of these movies ever. This is because of the users that we discussed earlier who rated these movie ratings and these somehow are closely related to ratings that Star wars received. Let’s filter out these ratings further.
We just included the number of ratings and used it filters out the movie titles. Let’s go ahead and take a look at our results
Now the results we are getting are closely related to Star Wars. Almost all star wars fans have heard of these movies or even watched them. So we were successful in recommending movies similar to Star Wars. Let’s carry out the same process for Liar Liar as well.
Doing the filtering based on the number of ratings helped us in narrowing down our results to get the desired results so we must be careful when we are working on such Recommender systems. With this, we conclude this article on Recommender Systems.
Link to the code – https://github.com/AM1CODES/MLAlgs/blob/master/Movie%20Recommender.ipynb