Cold Start Recommendations (Cold Start Items)

Solution Background and Business Value

Cold start recommendation is a very common challenge for many in the e-commerce industry like Amazon and eBay, where new items/listings get added all the time. The problem is at its extreme in recommendation cases like Eventbrite or Ticketmaster, where most if not all of the item entities are non-repeated; we are always trying to recommend an unseen item! This problem is crucially important for many applications where the quality of recommendations are closely tied to the business metrics as is the case with user engagement and revenue. Thus improving on the the performance of these cold start personalized solutions brings direct business impact.

The core of this challenge stems the lack presence in the past data. In this solution, we discuss multiple scenarios and how to use Kumo's different configurations to approach this problem.

At Kumo, we have multiple approaches and strategies used to address this challenge which can be roughly summarized into two categories:

Rich Feature(s): Kumo learns from the rich features of entities and for cold start entities, we leverage the features to infer the relevance.
GNN propagation/structure (static two-tower only): By connecting new items to existing items via attributes like brand, category, store, and more, Kumo can leverage signals from similar items and infer the personalized preferences for cold start items.

Data Requirements and Kumo Graph

Core Tables

For a recommendation, we need the following minimum graph with three tables

Orders: This table contains the “interactions” between customers and items; event attendees and events; or audience and tickets. customer_id, item_id and timestamp are required whereas other interaction features are optional.
Customers: This table contains the static features for left-hand side (LHS) entities (i.e., the subjects of personalization). A primary key (customer_id) is required whereas other features are optional.
Items: This table contains the right-hand side (RHS) entities to be recommended. A primary key is required (item_id) and a timestamp is recommended to help Kumo to ensure high quality training.

Additional table suggestions

The following table is optional only if we want to use two-tower static link prediction as the solution where the right hand side entities also sample and aggregate features from linked tables.

Brands: this is an example of dimension table with at least one column brand_id to connect RHS entities table so that a round trip from items → brands → items can help leverage item-to-item links for cold-start items. As a result, new items can extract signals from the existing items as long as they share the same brand. For other use cases like ticket recommendation, we can use geolocations, event types, or hosts as the “bridging” table.
Other tables similar to brands which capture a hierarchy and/or relate different items together (item type, color, etc.)—it might be useful in experimenting with these tables, especially if the data is very sparse otherwise
Other tables similar to orders which capture user behavior (e.g. item views, comments, ratings, etc.)

Predictive Query

I. Temporal Recommendation

The following predictive query trains a personalized recommendation model per-customer which is exactly same to a normal recommendation predictive query (see other recommendation solutions).

If majority of the items are new we recommend using module: link_prediction_embedding with setting handle_new_entities: true and target_embedding_mode: feature to guide Kumo to focus on learning from the features
If there are abundant portion of non-cold start items: Kumo will automatically find a balance between GNN signals and features to accommodate both types of items.


PREDICT LIST_DISTINCT(orders.item_id, 0, 7) RANK TOP K
FOR EACH customers.customer_id
// Case 1. Majority cold start items
// module: link_prediction_embedding
// handle_new_entities: true
// target_embedding_mode: feature

// Case 2. Balanced mixture of cold and non-cold items
// module: link_prediction_ranking
// handle_new_entities: false
// target_embedding_mode: fusion

II. Static Link Prediction

If the problem can be formulated as a static link prediction naturally, we can also consider a two-tower static model then we can leverage the item-to-item connection set up with “brand” as a bridging

Ensure the orders table doesn’t have any timestamps.
Use link_prediction_embedding module to leverage the GNN signals for both LHS (customers) and RHS (items).

PREDICT LIST_DISTINCT(orders.item_id) RANK TOP K
FOR EACH customers.customer_id
// module: link_prediction_embedding

Deployment

With respect to deployment, there are no major differences between cold-start items problems and normal personalized recommendation use cases. Kumo's output can be either the per-LHS recommendations or embeddings.

For recommendation results, it can be directly used for serving personalized recommendation services;
For advanced use cases with more logics (re-ranker, multi-objective optimization or using Kumo as a embedding based candidate generation), embedding outputs might be easier to use because developers have more flexibility to train more models by attaching more features and logics.

Updated about 1 year ago