Recommender systems

Recommender systems are transforming the way people engage with content and products online. YouTube, for instance, drives nearly 70% of its view time through personalized content recommendations. Thanks to its recommendation engine, Amazon sees a 30% increase in product sales. Netflix has reported that users consume 80% more content due to its personalization.

The business case for recommendation systems is undeniable. They not only drive revenue but also increase user engagement and satisfaction. By delivering personalized suggestions, recommend systems stimulate users to discover new and relevant items, such as videos, products, content, and music. Today, every large e-commerce platform relies on recommenders to boost its profits.

Brief History of Recommenders.

Recommendation algorithms have been around since the mid-1990s. One of the earliest systems, Tapestry, was developed at Xerox PARC in 1994. In Tapestry, users manually rate items, and these ratings help recommend content to others with similar tastes. This concept evolved, and by the late 1990s, companies like Amazon were developing item-based collaborative filtering, a method that analyzed user behavior to suggest products. In 2006, the launch of the Netflix Prize competition marked another significant milestone in the field. Netflix offered a $1 million prize to anyone who could improve their recommendation algorithm by 10%, sparking innovation. Following the Netflix Prize, researchers began to explore ways to incorporate more diverse types of information into recommender systems. This period saw the rise of context-aware recommender systems, which consider contextual information such as time, location, or social setting when making recommendations.

As deep learning began to dominate various areas of artificial intelligence, it also made its way into recommender systems. In 2016, Google introduced the Wide & Deep Learning model for recommender systems, showcasing how deep neural networks could be effectively applied to this domain. The following year, Neural Collaborative Filtering was published, demonstrating how deep learning could be used to model the complex interactions between users and items in collaborative filtering. These developments started a wave of research into deep learning-based recommender systems, leading to significant improvements in recommendation quality across various domains.

How does it Work?

Collect Data

User data: User ID, demographic data.

Behavioral data: clicks, views, purchases, likes, comments, ratings, add to cart, events.

Product/ Content Data: category, price, tags, keywords, color, size, attributes, genre, director, writer, actors, publish date, metadata.

Transaction data: time, order amount, list of items

Contextual data: time of the day, week, device type, location.

Select a Method

Collaborative Filtering: Recommends based on user behavior and similar users.
Content-Based Filtering: Recommends items similar to what the user likes.
Hybrid Models: Combines collaborative and content-based approaches for more accurate recommendations.
Deep Learning Models: Neural networks for highly personalized recommendations.

Implement.

Data Preprocessing: Clean and prepare data (normalize, handle missing values).
Build & Train the model
Add the model to your website
Monitor: Track performance

Types of Recommenders

Collaborative filtering focuses on predicting a user’s interest in an item based on the preferences of similar users or items.

User-Based Collaborative Filtering

This approach identifies users with similar preferences and recommends items that similar users like. Users can be compared based on similarities that are explicit or implicit. Explicit similarities involve data users actively provide, such as purchases, add-to-cart events, likes, and ratings, where users express their preferences directly. On the other hand, implicit similarities are derived from behavioral data passively collected by platforms like Google Analytics, including events like page views, clicks, time spent on a page, scroll depth, and sessions. These interactions signal user interest without direct input. Additionally, demographic data, such as age, location, or gender, can be incorporated to refine user comparisons further.

3 5 2 5 1 3 4 2 1 1 5 4 1 2 4 2 1 5 4 πŸ‘¨β€πŸ’ΌJake 🀡🏻 Li Wei πŸ§‘πŸΌβ€πŸ’Ό Carlos 🀡🏽 Miguel 🎬 M1 🎬 M2 🎬 M3 🎬 M4 🎬 M5 No interaction Low interaction Medium interaction High interaction Highest interaction Ratings: 1 – Poor 2 – Fair 3 – Good 4 – Very Good 5 – Excellent

Let’s imagine the table represents a user-video interaction matrix. This user-interaction matrix is used for several purposes. First, it helps platforms like YouTube or Netflix identify viewing patterns and see which videos are most popular among users. Second, the matrix serves as the foundation for collaborative filtering algorithms, which compare users based on their engagement and ratings of movies. In this matrix, the rows are the users, and the columns represent videos. Each cell in the matrix reflects the user’s level of interaction with a specific video. Darker cells indicate high interaction, such as a user watching a video entirely or rating it highly; lighter cells show lower interaction, such as a brief view or skipping through the content. Nearly white cells represent no interaction, meaning the user has not engaged with that video. The numbers inside the cells represent the ratings given by users. Jake and Li Wei have rated M1, M2, M3, and M5 similarly, but Jake has interacted with M4 and rated it highly (5/5), while Li Wei has not interacted with M4 yet. The recommender system could suggest M4 to Li Wei based on similar ratings for other movies.

πŸ‘¨β€πŸ’Ό 🀡🏻 Jake Li Wei 🎬 M1 🎬 M4 Watched by both Similar users Watched by Jake, recommended to Li Wei

icon Cosine Similarity

Cosine similarity is an algorithm used in machine learning and data analysis to compare the similarity between two data points represented as vectors. The cosine similarity algorithm computes the cosine of the angle between two vectors to determine how similar they are in terms of direction.

X (Movie 1) Y (Movie 2) Z (Movie 3) Jake (3, 5, 2) Li Wei (3, 4, 2) Carlos (1, 5, 4) Miguel (4, 2, 1)

This graph represents user interactions with three movies in a 3D space. Each user (Jake, Li Wei, Carlos, and Miguel) is represented by a vector that points in a direction based on how they interacted with the three movies (Movie 1, Movie 2, and Movie 3). The position and length of each vector indicate the level of interaction each user had with these movies.

We have the coordinates of Jake’s vector: \( \mathbf{A} = [3, 5, 2] \) and Li Wei’s vector: \( \mathbf{B} = [3, 4, 2] \). The dot product measures how much two vectors are aligned or point in the same direction. It is calculated by multiplying corresponding components of two vectors and then summing the results. The dot product between Jake and Li Wei’s vectors is calculated as follows:

\[ \mathbf{A} \cdot \mathbf{B} = (3 \times 3) + (5 \times 4) + (2 \times 2) = 9 + 20 + 4 = 33 \]

Next, we need to calculate the magnitude of each vector:

\[ \|\mathbf{A}\| = \sqrt{3^2 + 5^2 + 2^2} = \sqrt{38} \approx 6.164 \]

The magnitude of Li Wei’s vector is:

\[ \|\mathbf{B}\| = \sqrt{29} \approx 5.385 \]

Finally, we calculate the cosine similarity:

\[ \text{Cosine Similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} = \frac{33}{6.164 \times 5.385} \approx 0.992 \]

This score of approximately 0.992 shows that Jake and Li Wei have very similar preferences based on these three movies, as their vectors are nearly pointing in the same direction.

icon K-Nearest Neighbors

We have the coordinates of Jake’s vector: \( \mathbf{A} = [3, 5, 2] \) and Li Wei’s vector: \( \mathbf{B} = [3, 4, 2] \). The Euclidean distance between two vectors measures how far apart they are in space. It is calculated by taking the square root of the sum of the squared differences between their corresponding components. The Euclidean distance between Jake and Li Wei’s vectors is:

\[ \text{Euclidean Distance} = \sqrt{(3 – 3)^2 + (5 – 4)^2 + (2 – 2)^2} = \sqrt{1} = 1 \]

The Euclidean distance of 1 shows that Jake and Li Wei have very similar preferences. Next, we apply K-Nearest Neighbors (KNN) to find the nearest neighbors for Jake and Li Wei based on their movie ratings. KNN identifies the closest neighbors by measuring the Euclidean distance between data points.

  • Jake: \( \mathbf{A} = [3, 5, 2] \)
  • Li Wei: \( \mathbf{B} = [3, 4, 2] \)
  • Carlos: \( \mathbf{C} = [1, 5, 4] \)
  • Miguel: \( \mathbf{D} = [4, 2, 1] \)

The distances between Jake and the other users are calculated as:

  • Distance between Jake and Carlos: \( \sqrt{8} \approx 2.828 \)
  • Distance between Jake and Miguel: \( \sqrt{11} \approx 3.317 \)

Based on these distances, Jake’s closest neighbor is Li Wei. This reflects their highly similar movie preferences, and KNN would recommend similar movies for both users based on these calculations.

icon Matrix Factorization

Matrix Factorization

We start with a matrix of movie ratings from four users (Jake, Li Wei, Carlos, Miguel) for five movies. This matrix represents known ratings, with some missing values (like Li Wei’s rating for Movie 4).

Our aim is to divide this large matrix into three smaller matrices:

  • User Matrix (U): Represents how much each user likes each latent feature.
  • Feature Matrix (Ξ£): Represents the strength or importance of each feature.
  • Movie Matrix (V^T): Represents how much each movie exhibits each feature.

These latent features are not predefined categories like “action” or “romance”, but abstract concepts that the algorithm discovers to explain the rating patterns.

The Factorization Process uses complex algorithms like Singular Value Decomposition (SVD) or Alternating Least Squares (ALS). These algorithms work to find the best values for U, Ξ£, and V^T such that their product approximates the original rating matrix as closely as possible.

Factorized Matrices

User Matrix (U):

  • Each row represents a user.
  • Each column represents a latent feature.
  • Values indicate how much each user likes or dislikes each feature.
  • Positive values indicate preference, negative values indicate dislike.

Feature Matrix (Ξ£):

  • A diagonal matrix where each value represents the importance of a latent feature.
  • Larger values indicate more influential features.

Movie Matrix (V^T):

  • Each row represents a movie.
  • Each column represents a latent feature.
  • Values indicate how much each movie exhibits each feature.

To predict a missing rating (like Li Wei’s rating for Movie 4):

  1. Take Li Wei’s row from the User Matrix.
  2. Multiply it element-wise with the diagonal of the Feature Matrix.
  3. Multiply the result with Movie 4’s column from the Movie Matrix (transposed).
  4. Sum up these multiplications to get the predicted rating.

Mathematically, this is equivalent to: RΜ‚ = U * Ξ£ * V^T

Interactive Demonstration

Original Rating Matrix

Factorized Matrices

View the different algorithms and types of User-based filtering.

Item-based Collaborative Filtering

In an item-based collaborative filtering system, instead of finding similar users, the system focuses on finding items rated similarly by users and then recommends these items to the user. The system gathers data on how users rate items. For example, users might rate products like shirts, movies, or books on a scale (e.g., 1 to 5 stars). The system compares the ratings of different items to find similarities between them. For example, if many users give Movie 2 and 4 similar ratings, the system concludes that these items are nearly identical.

1 0.8 0.4 0.2 0.7 0.8 1 0.3 0.9 0.6 0.4 0.3 1 0.8 0.6 0.2 0.9 0.8 1 0.7 0.7 0.6 0.6 0.7 1 🎬 M1 🎬 M2 🎬 M3 🎬 M4 🎬 M5 🎬 M1 🎬 M2 🎬 M3 🎬 M4 🎬 M5 1 (High Similarity) 0.2 (Low Similarity)

View the algorithms and different types of item-based filtering.

Collaborative filtering works well with enough interaction data but struggles with the cold start problem when there is limited data on new users or items. Collaborative filtering assumes that user preferences are correlated and that people with similar tastes will like similar things. However, the predictions may be unreliable when the user-item matrix doesn’t have a lot of data.

Content-Based Filtering

Content-based filtering recommends items by analyzing their attributes. In this approach, the system looks at an item’s content and recommends similar items based on a user’s past interactions with identical content.


User Known Preferences Recommended Based on Preferences New/Exploratory Suggestions Expected Interaction Level
πŸ‘±πŸ»β€β™‚οΈ Jake 🎬 M1 (Action, Thriller), 🎬 M3 (Drama, Romance) 🎬 M6 (Action, Sci-Fi), 🎬 M7 (Romantic Thriller) 🎬 M9 (Comedy) High (Action, Thriller), Low (Comedy)
πŸ‘¨πŸΌβ€πŸ¦² Li Wei 🎬 M2 (Drama, Romance) 🎬 M8 (Historical Drama), 🎬 M10 (Romantic Drama) 🎬 M12 (Sci-Fi) High (Drama), Medium (Sci-Fi)
πŸ‘¨πŸ½β€πŸ¦± Carlos 🎬 M3 (Adventure, Comedy) 🎬 M11 (Adventure, Fantasy), 🎬 M14 (Action Comedy) 🎬 M7 (Romantic Thriller) High (Adventure, Comedy), Low (Romantic Thriller)
πŸ§”πŸΎβ€β™‚οΈ Miguel 🎬 M4 (Thriller, Horror) 🎬 M12 (Psychological Thriller), 🎬 M13 (Supernatural Horror) 🎬 M5 (Action) High (Thriller, Horror), Medium (Action)

In a content-based recommender system, each item is represented by attributes or features. For example, a movie may be described by its genre, director, actors, or plot keywords. When users interact with or rate a movie, the system identifies similar films based on those features.

In this example, we will see user Jake, who watched M1 and M3, and based on the genre, he will see M6 and M7 recommended

Watched

Recommended


Movie 1

M1: Inception

Action, Thriller

Movie 3

M3: The Notebook

Drama, Romance

Movie 6

M6: Rebel Moon

Action, Sci-Fi

Movie 7

M7

Romantic Thriller


View the algorithms driving Content-Based Filtering

Hybrid recommender systems combine multiple recommendation techniques to create better recommendations. They combine collaborative and content-based filtering approaches. They use user-item interactions from collaborative filtering and user/item attributes from content-based filtering to overcome the limitations of relying solely on one method.

Content-based filtering is used for new users or items with no data, but when more data becomes available, collaborative filtering becomes available. Mixed methods could include, for example, weighted hybrid algorithms. This method combines multiple algorithms by assigning weights to each. For example, you can apply collaborative filtering, content-based filtering, and popularity-based recommendations and then compute a weighted sum of the recommendations to create a final list.

The general formula for the recommendation score for an item \( i \) for user \( u \) in a weighted hybrid system is:

\[ \text{Score}(u, i) = w_1 \cdot \text{CF}(u, i) + w_2 \cdot \text{CB}(u, i) + w_3 \cdot \text{Popularity}(i) \]

Where:

  • \( w_1, w_2, w_3 \): weights assigned to different algorithms.
  • \( \text{CF}(u, i) \): score from collaborative filtering.
  • \( \text{CB}(u, i) \): score from content-based filtering.
  • \( \text{Popularity}(i) \): score based on the popularity of item \( i \).

We could also use switching hybrid algorithms; in this approach, the system switches between different algorithms based on conditions like user interaction history or content characteristics. It chooses the best algorithm for each situation, such as content-based filtering for new users (cold-start problem) and collaborative filtering for regular users. For example, if a user had limited interactions with the content, we could use content-based filtering, and once there is enough data, we could switch to collaborative filtering. In cascade hybrid methods, the algorithms are applied sequentially. For instance, collaborative filtering might create a shortlist of items, and content-based filtering re-ranks them based on specific user preferences.

In feature augmentation, one algorithm generates features that are used by another. For example, content-based filtering might extract user preferences for specific attributes, which are fed into a collaborative filtering model to improve recommendations.

In recommender systems, advanced deep learning and neural networks offer the opportunity to create deeper insights using larger datasets than traditional machine learning. Neural networks can have more layers or neurons to handle more data. They can also handle multiple tasks simultaneously, like predicting user actions (such as watching or rating a movie). This helps the model identify better patterns in the data. Neural networks can work with various data types, like movie posters (convolutional neural networks), trailers (transformers), or descriptions (natural language processing models).

Types of neural networks:

icon NCF

Neural Collaborative Filtering (NCF) is an advanced recommendation method that improves on traditional collaborative filtering by using deep learning models. Instead of relying purely on user-item interactions (like a traditional collaborative filtering matrix), NCF uses the power of neural networks to learn complex relationships between users and items.

Let’s use the following matrix of user ratings as an example:

🎬 M1 🎬 M2 🎬 M3 🎬 M4 🎬 M5
πŸ‘¨ Jake 3 5 2 5 1
🀡🏻 Li Wei 3 4 2 1
Carlos 1 5 4 1 2
🀡🏽 Miguel 4 2 1 5 4

In a traditional User-based Collaborative Filtering, we might recommend a movie to Li Wei based on Jake’s similar ratings(see example in collaborative filtering).For example, since Jake and Li Wei have both rated M1, M2, M3, and M5 similarly, the system would recommend M4 to Li Wei, which Jake has rated highly but Li Wei has not yet watched. However, NCF takes this a step further by using deep learning to better model interactions between users and items. It does so by embedding both users and items into dense vectors and then learning a neural network that captures their complex relationships.

In NCF, each user (Jake, Li Wei, Carlos, and Miguel) is mapped to a dense vector representation. For example, Jake might be represented by a vector like [0.5, 1.2, -0.3], while Li Wei might be represented as [0.4, 1.0, -0.2]. These vectors capture their interaction history in a compressed form. Each movie (M1, M2, M3, M4, M5) is also mapped to a dense vector representation. For instance, M1 could have a vector like [0.7, -0.1, 0.9], and M4 might be represented as [0.9, 0.2, 0.4].

The embeddings for a user (say Li Wei) and a movie (say M4) are fed into a neural network. The network combines the user and item vectors and learns to predict the likelihood of Li Wei enjoying M4 based on their previous interaction patterns with other items (or users). Once trained, the NCF model can predict the rating Li Wei would give to M4 by learning nonlinear patterns that are harder to capture in traditional methods. This goes beyond just looking at similar users; it can understand deeper, more nuanced relationships between users and items. In this case, NCF may predict that Li Wei would give M4 a high rating (say 4 or 5), confirming that M4 is a suitable recommendation based on more complex learned patterns.

NCF (Neural Collaborative Filtering) can predict user-item relationships more accurately by capturing nonlinear patterns and contextual information. It analyzes user interactions with similar items and adapts to changes in user behavior over time. NCF also considers implicit signals and social connections, as well as demographic and psychographic information to provide tailored recommendations.

icon Autoencoders

An autoencoder is a neural network that learns how to compress and reconstruct input data. In the context of recommendation systems, its purpose is to take a user-item interaction matrix, like a matrix of user ratings for movies, and reduce it to a more straightforward, compressed form that captures the essential information.

The encoder takes the rating matrix, and reduces it to a smaller version, named a latent representation.This smaller version highlights the essential patterns in the data, such as user preferences or movie characteristics. The decoder takes this data and tries to reconstruct the orginal input, filling in missing values. The key idea behind an autoencoder is that by forcing the network to learn a compressed version of the data, it also knows the most important relationships between users and items, which can then be used to make recommendations. Essentially, the autoencoder is learning to represent complex data (like user ratings) in a more straightforward form that still captures the most critical features and then uses that representation to predict missing information. The information that needs to be added can be ratings or preferences. For example, if Jake has rated movies M1, M2, and M3 but hasn’t rated M4, the autoencoder will try to predict Jake’s rating for M4. This prediction is based on how Jake’s preferences (captured in the latent space) relate to the characteristics of M4, as well as how similar users have rated that movie. By reconstructing the entire user-item matrix, including the missing ratings, the autoencoder helps fill in gaps in the data, which can then be used to generate personalized recommendations for each user.

icon RNNs & Sequential Recommenders

Recurrent Neural Networks (RNNs) are well-suited for sequential recommendation systems where the order of user interactions is important. In many scenarios, user preferences evolve over time, and understanding the sequence of interactions helps in predicting what users might be interested in next. RNNs are able to maintain a memory of previous interactions, which is useful for capturing temporal dynamics in user behavior.

Consider a user who watched M1, then M2, and then M3. The RNN model processes these interactions in a sequence, learning to predict the next possible movie the user might want to watch. This prediction is based on how similar sequences of movies have been watched by other users. In our example, if Li Wei watched M1, M2, and M3, the RNN might predict that M4 would be a suitable recommendation, as it has often followed similar sequences for other users. By maintaining a memory, RNNs can handle changing user interests more effectively.

RNNs are powerful in contexts like video streaming services or e-commerce, where the order of interactions significantly influences user behavior. By learning sequences, RNN-based recommenders can provide more personalized and timely suggestions, anticipating what users are likely to do next.

icon CNNs & Feature Extraction

CNN Feature Extraction Process

The Joker movie poster is an example of how Convolutional Neural Networks (CNNs) extract features. CNNs are are good at identifying essential details from different types of content, like visual and written data. The feature extraction process has a few stages, starting with the original movie poster as the input. Convolution Layers find features such as edges, colors, and textures at different levels of complexity. Then, Pooling Layers summarize and condense the information, followed by a Flattening Layer that turns the 2D feature maps into a 1D vector of 100,352 features. Finally, the Fully Connected Layer combines all the extracted features for final processing. The results from the CNN can be used to identify a movie’s visual style, themes, and genre characteristics. CNNs help recommendation systems understand rich multimedia content, resulting in more accurate and context-aware suggestions.

icon Attention & Transformers

Transformers are now being used in recommendation systems to process sequences of user actions effectively. Initially designed for language processing, the Transformer architecture uses self-attention mechanisms to understand relationships between inputs. Transformers are very effective in recommendation tasks where understanding sequential patterns is crucial. Transformers can analyze how users like or skip content and adjust their context-based recommendations. A Transformer-based recommendation system can process user actions in sequence and assign varying levels of importance based on the context.

In our movie recommendation example, an attention-based model focuses more on recent interactions highly relevant to a user’s interests. For instance, if Miguel recently watched M4 and rated it highly, an attention mechanism would assign greater weight to this interaction when predicting what he might watch next. For transformers to work effectively, they need a variety of data inputs, such as user profile information (demographics), contextual data like time of day, type of device, and location, sequence data (a series of actions over time), and information about the content the user engaged with, such as categories, descriptions, and content details.

Transformer Movie Recommender
User Data Inputs:
– User Profile (Demographics)
– Contextual Data (Time of Day, Device, Location)
– Sequence Data (Actions Over Time)
– Content Information (Categories, Descriptions)
User Behavior Encoder
Transformer Layer
Transformer Layer
Item Decoder
Transformer Layer*
Transformer Layer*
Recommended Movies
* Transformer layers that focus on item attributes (genres, description, etc.)

icon GNNs & Deep Learning

Graph Neural Networks (GNNs) are used for recommendations by modeling users and items as nodes in a graph, capturing relationships and interactions between them more effectively.

Learn more about GNNs

Resources

Google. β€œRecommendation Systems Overview.” Google Developers, https://developers.google.com/machine-learning/recommendation

Ricci, F., Rokach, L., & Shapira, B. (Eds.). (2022). Recommender systems handbook (3rd ed.). Springer. https://link.springer.com/book/10.1007/978-1-0716-2197-4#bibliographic-information

Roy, D., Dutta, M. A systematic review and research perspective on recommender systems.Β J Big DataΒ 9, 59 (2022). https://doi.org/10.1186/s40537-022-00592-5

Sanchez-Lengeling, Benjamin, et al. β€œA Gentle Introduction to Graph Neural Networks.” Distill, vol. 6, no. 8, 17 Aug. 2021, https://distill.pub/2021/gnn-intro/