Loading [MathJax]/jax/element/mml/optable/GeneralPunctuation.js

Recommender systems

Recommender systems are transforming the way people engage with content and products online. YouTube, for instance, drives nearly 70% of its view time through personalized content recommendations. Thanks to its recommendation engine, Amazon sees a 30% increase in product sales. Netflix has reported that users consume 80% more content due to its personalization.

The business case for recommendation systems is undeniable. They not only drive revenue but also increase user engagement and satisfaction. By delivering personalized suggestions, recommender systems stimulate users to discover new and relevant items, such as videos, products, content, and music. Today, every large e-commerce platform relies on recommender systems to boost its profits.

Brief History of Recommenders.

Recommendation algorithms have been around since the mid-1990s. One of the earliest systems, Tapestry, was developed at Xerox PARC in 1994. In Tapestry, users manually rate items, and these ratings help recommend content to others with similar tastes. This concept evolved, and by the late 1990s, companies like Amazon were developing item-based collaborative filtering, a method that analyzed user behavior to suggest products. In 2006, the launch of the Netflix Prize competition marked another significant milestone in the field. Netflix offered a $1 million prize to anyone who could improve their recommendation algorithm by 10%, sparking innovation. Following the Netflix Prize, researchers began to explore ways to incorporate more diverse types of information into recommender systems. This period saw the rise of context-aware recommender systems, which consider contextual information such as time, location, or social setting when making recommendations.

As deep learning began to dominate various areas of artificial intelligence, it also made its way into recommender systems. In 2016, Google introduced the Wide & Deep Learning model for recommender systems, showcasing how deep neural networks could be effectively applied to this domain. The following year, Neural Collaborative Filtering was published, demonstrating how deep learning could be used to model the complex interactions between users and items in collaborative filtering. These developments started a wave of research into deep learning-based recommender systems, leading to significant improvements in recommendation quality across various domains.

How does it Work?

Collect Data

User data: User ID, demographic data.

Behavioral data: clicks, views, purchases, likes, comments, ratings, add to cart, events.

Product/ Content Data: category, price, tags, keywords, color, size, attributes, genre, director, writer, actors, publish date, metadata.

Transaction data: time, order amount, list of items

Contextual data: time of the day, week, device type, location.

Select a Method

Collaborative Filtering: Recommends based on user behavior and similar users.
Content-Based Filtering: Recommends items similar to what the user likes.
Hybrid Models: Combines collaborative and content-based approaches for more accurate recommendations.
Deep Learning Models: Neural networks for highly personalized recommendations.

Implement.

Data Preprocessing:

Clean and prepare data (normalize, handle missing values).
Build & Train the model
Add the model to your website
Monitor: Track performance

Types of Recommenders

Collaborative filtering focuses on predicting a user’s interest in an item based on the preferences of similar users or items.

User-Based Collaborative Filtering (UBCF)

This approach identifies users with similar preferences and recommends items that similar users like. Users can be compared based on similarities that are explicit or implicit. Explicit similarities involve data users actively provide, such as purchases, add-to-cart events, likes, and ratings, where users express their preferences directly. On the other hand, implicit similarities are derived from behavioral data passively collected by platforms like Google Analytics, including events like page views, clicks, time spent on a page, scroll depth, and sessions. These interactions signal user interest without direct input. Additionally, demographic data, such as age, location, or gender, can be incorporated to refine user comparisons further.

3 5 2 5 1 3 4 2 1 1 5 4 1 2 4 2 1 5 4 👨‍💼Jake 🤵🏻 Li Wei 🧑🏼‍💼 Carlos 🤵🏽 Miguel 🎬 M1 🎬 M2 🎬 M3 🎬 M4 🎬 M5 No interaction Low interaction Medium interaction High interaction Highest interaction Ratings: 1 – Poor 2 – Fair 3 – Good 4 – Very Good 5 – Excellent

Let’s imagine the table represents a user-video interaction matrix. This user-interaction matrix is used for several purposes. First, it helps platforms like YouTube or Netflix identify viewing patterns and see which videos are most popular among users. Second, the matrix serves as the foundation for collaborative filtering algorithms, which compare users based on their engagement and ratings of movies. In this matrix, the rows are the users, and the columns represent videos. Each cell in the matrix reflects the user’s level of interaction with a specific video. Darker cells indicate high interaction, such as a user watching a video entirely or rating it highly; lighter cells show lower interaction, such as a brief view or skipping through the content. Nearly white cells represent no interaction, meaning the user has not engaged with that video. The numbers inside the cells represent the ratings given by users. Jake and Li Wei have rated M1, M2, M3, and M5 similarly, but Jake has interacted with M4 and rated it highly (5/5), while Li Wei has not interacted with M4 yet. The recommender system could suggest M4 to Li Wei based on similar ratings for other movies.

👨‍💼 🤵🏻 Jake Li Wei 🎬 M1 🎬 M4 Watched by both Similar users Watched by Jake, recommended to Li Wei

icon Cosine Similarity

Cosine similarity is an algorithm used in machine learning and data analysis to compare the similarity between two data points represented as vectors. The cosine similarity algorithm computes the cosine of the angle between two vectors to determine how similar they are in terms of direction.

X (Movie 1) Y (Movie 2) Z (Movie 3) Jake (3, 5, 2) Li Wei (3, 4, 2) Carlos (1, 5, 4) Miguel (4, 2, 1)

This graph represents user interactions with three movies in a 3D space. Each user (Jake, Li Wei, Carlos, and Miguel) is represented by a vector that points in a direction based on how they interacted with the three movies (Movie 1, Movie 2, and Movie 3). The position and length of each vector indicate the level of interaction each user had with these movies.

We have the coordinates of Jake’s vector: \mathbf{A} = [3, 5, 2] and Li Wei’s vector: \mathbf{B} = [3, 4, 2] . The dot product measures how much two vectors are aligned or point in the same direction. It is calculated by multiplying corresponding components of two vectors and then summing the results. The dot product between Jake and Li Wei’s vectors is calculated as follows:

\mathbf{A} \cdot \mathbf{B} = (3 \times 3) + (5 \times 4) + (2 \times 2) = 9 + 20 + 4 = 33

Next, we need to calculate the magnitude of each vector:

\|\mathbf{A}\| = \sqrt{3^2 + 5^2 + 2^2} = \sqrt{38} \approx 6.164

The magnitude of Li Wei’s vector is:

\|\mathbf{B}\| = \sqrt{29} \approx 5.385

Finally, we calculate the cosine similarity:

\text{Cosine Similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} = \frac{33}{6.164 \times 5.385} \approx 0.992

This score of approximately 0.992 shows that Jake and Li Wei have very similar preferences based on these three movies, as their vectors are nearly pointing in the same direction.

icon K-Nearest Neighbors

We have the coordinates of Jake’s vector: \mathbf{A} = [3, 5, 2] and Li Wei’s vector: \mathbf{B} = [3, 4, 2] . The Euclidean distance between two vectors measures how far apart they are in space. It is calculated by taking the square root of the sum of the squared differences between their corresponding components. The Euclidean distance between Jake and Li Wei’s vectors is:

\text{Euclidean Distance} = \sqrt{(3 – 3)^2 + (5 – 4)^2 + (2 – 2)^2} = \sqrt{1} = 1

The Euclidean distance of 1 shows that Jake and Li Wei have very similar preferences. Next, we apply K-Nearest Neighbors (KNN) to find the nearest neighbors for Jake and Li Wei based on their movie ratings. KNN identifies the closest neighbors by measuring the Euclidean distance between data points.

  • Jake: \mathbf{A} = [3, 5, 2]
  • Li Wei: \mathbf{B} = [3, 4, 2]
  • Carlos: \mathbf{C} = [1, 5, 4]
  • Miguel: \mathbf{D} = [4, 2, 1]

The distances between Jake and the other users are calculated as:

  • Distance between Jake and Carlos: \sqrt{8} \approx 2.828
  • Distance between Jake and Miguel: \sqrt{11} \approx 3.317

Based on these distances, Jake’s closest neighbor is Li Wei. This reflects their highly similar movie preferences, and KNN would recommend similar movies for both users based on these calculations.

icon Matrix Factorization

Matrix Factorization

We start with a matrix of movie ratings from four users (Jake, Li Wei, Carlos, Miguel) for five movies. This matrix represents known ratings, with some missing values (like Li Wei’s rating for Movie 4).

Our aim is to divide this large matrix into three smaller matrices:

  • User Matrix (U): Represents how much each user likes each latent feature.
  • Feature Matrix (Σ): Represents the strength or importance of each feature.
  • Movie Matrix (V^T): Represents how much each movie exhibits each feature.

These latent features are not predefined categories like “action” or “romance”, but abstract concepts that the algorithm discovers to explain the rating patterns.

The Factorization Process uses complex algorithms like Singular Value Decomposition (SVD) or Alternating Least Squares (ALS). These algorithms work to find the best values for U, Σ, and V^T such that their product approximates the original rating matrix as closely as possible.

Factorized Matrices

User Matrix (U):

  • Each row represents a user.
  • Each column represents a latent feature.
  • Values indicate how much each user likes or dislikes each feature.
  • Positive values indicate preference, negative values indicate dislike.

Feature Matrix (Σ):

  • A diagonal matrix where each value represents the importance of a latent feature.
  • Larger values indicate more influential features.

Movie Matrix (V^T):

  • Each row represents a movie.
  • Each column represents a latent feature.
  • Values indicate how much each movie exhibits each feature.

To predict a missing rating (like Li Wei’s rating for Movie 4):

  1. Take Li Wei’s row from the User Matrix.
  2. Multiply it element-wise with the diagonal of the Feature Matrix.
  3. Multiply the result with Movie 4’s column from the Movie Matrix (transposed).
  4. Sum up these multiplications to get the predicted rating.

Mathematically, this is equivalent to: R̂ = U * Σ * V^T

Interactive Demonstration

Original Rating Matrix

3.0
5.0
2.0
5.0
1.0
3.0
4.0
2.0
-
1.0
1.0
5.0
4.0
1.0
2.0
4.0
2.0
1.0
5.0
4.0

Factorized Matrices

View the different algorithms and types of User-based filtering.

Item-based Collaborative Filtering (IBCF)

In an item-based collaborative filtering system, instead of finding similar users, the system focuses on finding items rated similarly by users and then recommends these items to the user. The system gathers data on how users rate items. For example, users might rate products like shirts, movies, or books on a scale (e.g., 1 to 5 stars). The system compares the ratings of different items to find similarities between them. For example, if many users give Movie 2 and 4 similar ratings, the system concludes that these items are nearly identical.

1 0.8 0.4 0.2 0.7 0.8 1 0.3 0.9 0.6 0.4 0.3 1 0.8 0.6 0.2 0.9 0.8 1 0.7 0.7 0.6 0.6 0.7 1 🎬 M1 🎬 M2 🎬 M3 🎬 M4 🎬 M5 🎬 M1 🎬 M2 🎬 M3 🎬 M4 🎬 M5 1 (High Similarity) 0.2 (Low Similarity)

View the algorithms and different types of item-based filtering.

Collaborative filtering works well with enough interaction data but struggles with the cold start problem when there is limited data on new users or items. Collaborative filtering assumes that user preferences are correlated and that people with similar tastes will like similar things. However, the predictions may be unreliable when the user-item matrix doesn’t have a lot of data.

Resources

Google. “Recommendation Systems Overview.” Google Developers, https://developers.google.com/machine-learning/recommendation

Ricci, F., Rokach, L., & Shapira, B. (Eds.). (2022). Recommender systems handbook (3rd ed.). Springer. https://link.springer.com/book/10.1007/978-1-0716-2197-4#bibliographic-information

Roy, D., Dutta, M. A systematic review and research perspective on recommender systems. J Big Data 9, 59 (2022). https://doi.org/10.1186/s40537-022-00592-5

Sanchez-Lengeling, Benjamin, et al. “A Gentle Introduction to Graph Neural Networks.” Distill, vol. 6, no. 8, 17 Aug. 2021, https://distill.pub/2021/gnn-intro/