Why Kumo for Growth and Marketing?

Global marketing investment is projected to grow 8% to over $1 trillion in 2024. As companies see their costs per acquisition go up, many turn to Kumo to help drive efficient growth through machine learning.

Kumo's predictive query language makes it easy to build a wide variety of ML models for both B2B and B2C growth teams, including churn, lifetime value (LTV), lead ranking, conversion propensity (CTR), personalized call to actions, user segmentation, and item recommendations. And by using graph neural nets (GNNs) and large language models (LLM) with complex relational data, Kumo delivers superior predictive accuracy when compared with traditional customer relationship management (CRM), account based marketing (ABM), and personalization solutions.

Kumo's integrations make it easy to plug into existing tools, enabling teams to deliver more models in less time. Additionally, data warehouse native deployments minimize the need for time-consuming security reviews, by performing data processing within your Snowflake or Databricks accounts.

Growth and Marketing Solutions

Kumo’s predictive query language (PQL) provides the flexibility to build a wide variety of growth and marketing models. PQL statements enable data scientists declare the entity, target, filters, and optimization goal of a predictive task, in a purely declarative manner. In practice, this enables teams to quickly build GNN-powered recommendations for dozens of predictive tasks across the customer acquisition funnel.

Here are a few of the solutions that Kumo supports.

Lead Scoring
Churn Prediction
Lifetime user value prediction (LTV)
Conversion Propensity
Next Best Action
Personalized Email Recommendations
Ad Retargeting Optimization
Audience Creation or User Segmentation

Kumo is a good fit for…

Data science and machine learning teams that support a growth function, and are looking to multiply their impact by delivering better models faster.
Product managers and CMOs and Heads of performance marketing who have a results-first mindset, and are not satisfied with the model accuracy of their existing CRM and ABM tools.
Businesses with unique data schemas that are hard to connect with traditional lead ranking and personalization
Engineering teams building an in-house growth optimization system, and looking to invest in a future-proof ML platform to support this initiative.
Organizations looking to invest in a general machine learning platform to up-level a wide variety of solutions beyond growth, including recommendations, personalization, fraud detection, demand forecasting, and price optimization.

Stand-out Features

The GNN+LLM model architecture is very effective at finding signal in the data. A Kumo customer saw 4x more accuracy with Kumo's lead scoring solution, compared to a well known lead-scoring vendor that had been around for more than a decade!
Solutions for both batch and real time integrations, enabling a single platform to power recurring (eg. marketing campaigns) and real time (eg. ad retargeting) use cases.
Entity Level Explainability, enables customer-facing teams to understand the reason for each churn prediction, empowering operations teams to make smarter decisions on how to reach out.
Great handling of cold start. The model can make high-quality LTV recommendations for recently joined users, which there is little to no information using inductive representation learning.
GNN model planner enables data scientists to tune the model architecture for the dataset, including the training split, neighborhood sampling strategy, and model hyperparameters.
LLM-powered text understanding enables the model to understand unstructured data, such as conversation history, messages, posts, and reviews.
REST API and SDK, enabling data scientists and ML engineers to develop, test, and deploy recommendation models directly from notebooks or as part of an automated workflow.

Data Requirements

Kumo does not require data to be transformed to fit a prescriptive schema, nor does it require the installation of a tracking pixel.

Instead, Kumo makes predictions directly from the raw data that already exists in the data warehouse. The Kumo graph builder makes it easy to stitch together data from many different sources. Just connect the tables to the graph and go.

For the best results on growth and marketing use cases, Kumo encourages using data such as:

Customer profile information
Auth/unauth browsing history
User behavior, such as purchases, likes, searches, reviews
Messaging and customer support history
Second and third party data that you may have obtained from other providers

Since Kumo can stitch together signal across multiple different data sources, Kumo can achieve significantly better predictive accuracy than traditional alternatives.

Data Connectivity

Kumo reads and writes data directly to the client's data lakehouse, supporting cloud-first data science workflows. Supported lakehouses include Snowflake, Databricks, AWS S3, GCP BigQuery, and Azure Synapse (coming soon). For example, users have found success using Kumo as part of a DBT-based development environment, using Airflow for orchestration, and Streamlit for consumption.

Data Warehouse Native

Additionally, Kumo provides data warehouse native deployment options, which keeps your data secure by performing data processing within your Snowflake or Databricks account. This makes Kumo suitable for use in highly regulated environments, including banking, healthcare, and government.

Scale

Kumo uses a distributed GNN training system, written in C++, which can handle multi-terabyte datasets with tens of billions of rows, and has customers that make daily recommendations for more than 100M active users or more than 10M inventory items.

Because GNNs are great at discovering complex patterns in sparse data, Kumo is also a good fit for small datasets containing 1000’s of users, and only 10’s of items.

Production Serving

In order to cover both in-product and out-of-product growth use cases, Kumo supports a variety of serving methods:

Batch Export: Export bulk predictions via UI or API on an hourly or daily basis. This is the easiest way to export predictive audiences to downstream systems such as marketing orchestration software, advertising platforms, or dashboard for use by internal customer operations teams.
Online Serving: Kumo enables you to create predictions on the fly using a low latency REST API. This unlocks many time-sensitive use cases, such as real-time ad retargeting, in-product onboarding optimization, and next-best-action recommendation.

Model Architecture

Kumo recommendations are powered by a GNN architecture, inspired by several academic papers in recent history. Data scientists can benefit from these advances in model architecture, without needing to code them up manually.

Here is some of the research that is used by Kumo AI:

GraphSAGE does inductive representation learning to deliver great recommendations for users with very little interaction data such as first or second time visitors
PNA introduces a variety of aggregation operators which are explored by Kumo AutoML
GCN describes mean-pooling aggregation, which captures similarity between users with similar item purchases
GIN captures frequency signal to learn more complex user behavior like power users vs resurrected users
NBF networks reduce the computational cost of models, by providing an efficient way to capture paths between nodes
GraphMixer uses temporal representation learning, to interpret sequences of user actions such as on-site browsing history
RDL describes temporal sampling, which learns from past sequences of user actions to predict the future.

Data Encoding

Kumo also uses a powerful data encoding stack to convert multi-modal data into representations for deep learning.

PyTorch Frame finds the best encoding for a variety of tabular data types.
LLM foundation models can be used for understanding rich text data
Absolute and relative time encodings learn historical and seasonal patterns

Model Planner

The Kumo model planner empowers data scientists to quickly iterate and apply their domain knowledge to the model.

Specifically, the Kumo model planner gives control over:

training table splits
neighborhood sampling
column encoding
training process
GNN architecture
optimization goals

For example, in the world of e-commerce, it is very common for user behavior to differ between the "December holiday season" and the rest of the year. If you want to make sure your churn prediction model can generalize through the year, tune the split parameter of the model planner, to pick appropriate time frames for train, validation, and test.

Data Scientists have achieved 73% performance lift on a "next best action" model, using the fine-grained controls offered in the model planner.

Predictive Query Language

PQL is a declarative syntax for defining machine learning problems. It is highly flexible and easy to learn, supporting inline filters, boolean expressions, and aggregation functions.

Data scientists can quickly experiment with many different and complex predictive formulations of a machine learning problem in very few lines of code.

For example, the following "lead ranking" query predicts whether a each active lead will have a conversion in the next N days, assuming that a sales person reaches out to them tomorrow.

PREDICT COUNT(events.* WHERE events.type = 'conversion', 0, N, days ) > 0
FOR EACH leads.lead_id
WHERE COUNT(triggers.*,-1, 0, days) > 0
ASSUMING COUNT(events.* WHERE events.source = 'sales', 0, 1, days ) > 0

Evaluation & Explainability

As part of the training process, Kumo automatically computes data visualizations and metrics to help understand the model’s strengths and weaknesses.

Learning Curves and Distribution: Detects under and overfitting by monitoring the convergence rates. Tracks distribution of training data over time for balance.
Backtesting on Holdout: All models are back-tested on a configurable holdout dataset. The holdout data set may be downloaded for customer analysis.
Standard Evaluation Metrics and Charts: Including: ROC and PRC curve, cumulative gain chart, AUPRC, AUROC, predicted vs actual scatter plot and histogram, MAE, MSE, RMSE, SMAPE, average precision, per-category recall, F1, MAP
Baseline Comparison: Models are benchmarked against an automatically generated analytic baseline.
Column Explainability: A visualization highlighting which columns have the greeted predictive power helps prove that the model has no data leakage
Row Level Explainability: Users can understand the reason for individual predictions by seeing which rows contributed most to the result.

MLOps

In order to support ongoing validation of model correctness, Kumo has the following features related to MLOps:

Data Source Snapshotting: During each job, data source statistics are snapshotted including size, time range, and import time, to enable faster root cause analysis.
Drift Detection: Distributions of features and predictions are monitored for drift. This enables early detection of issues, preventing bad predictions from being published to production.
Champion/Challenger: A champion/challenger approach can be adopted to validate key metrics of a newly trained model when orchestrating automatic job retraining through the REST API

Integrations

Because Kumo writes directly to the data lakehouse, it is easy to connect with other cloud software commonly used in growth, marketing, or sales. Here are just a few examples:

Marketing Orchestration: Braze, Adobe/Marketo, Klaviyo, Omnisend, etc… Create ML-powered customer segments and insert personalized dynamic content such as product recommendations
Advertising: Google Performance Max, Meta Advantage Plus, etc… Assign users to custom audiences using ML, to show creatives that are more likely to convert
Analytics/Internal Tools: Streamlit, DBT, Looker, Tableau, Domo, Knowi, etc… Build ML-powered internal tools and dashboards for use by non-technical business users
Sales/CRM: Salesforce, Microsoft, Oracle, Zendesk, Zoho, etc… Rank your leads with ML for greater sales team efficiency
Chatbots/Language Models: OpenAI, Cortex, Bard, DBRX… Use Kumo to create predictions and human-understandable explanation that operations teams can take action on.