Predictive Query Structure
At the bare minimum, your Predictive Query will have an entity and target, as well as an optional filter to scale down your prediction results.
Target
The target is essentially the question you are trying to answer, preceded by the PREDICT
command. For example, if you want to predict the total amount of money spent by each user over the next thirty days, then the "sum of purchases over the next thirty days" is your target.
Entity
The entity is who you are making predictions for. For example, if you want to predict the total amount of purchases per user, then the user is your entity.
Aggregation Operators
If you want to predict some aggregation of values over time (e.g., sum of purchase values over the next 30 days), you can specify an aggregation operator and the column representing the value you want to aggregate.
Some common aggregation functions include SUM()
and COUNT()
, to name a few.
NOTE: Please refer to the Predictive Query Reference for a complete list of Kumo aggregation operators.
Revisiting the previous example, the following usage of the SUM()
aggregation operator allows you to predict the total number of sales each customer will make in the next 30 days:
PREDICT SUM(TRANSACTIONS.PRICE, 0, 30, days)
FOR EACH CUSTOMERS.CUSTOMER_ID
Within the aggregation function inputs, the start and end parameters refer to the time period you want to aggregate across, calculated in days. For example, 10 for start and 30 for end implies that you want to aggregate from 10 days later (excluding the 10th day) to 30 days later (including the 30th day). The time unit of the aggregation defaults to 'days' if none is specified.
If you're making the prediction on 2020-01-01 00:00:00
, Kumo will aggregate all rows with timestamps t where 2020-01-11 00:00:00 < t <= 2020-01-31 00:00:00
.
When using aggregation with targets, both start and end values should be non-negative integers, and end values should be greater than start values.
Aggregation Units
In the above example, we use a start value of 10
, an end value of 30
, and a unit of days
. If the time unit is not specified, we assume that the start and end values are specified in days.
PREDICT SUM(TRANSACTIONS.PRICE, 0, 30)
FOR EACH CUSTOMERS.CUSTOMER_ID
You may additionally specify the unit as months
or hours
Filters
Filtering allows you to further contextualize your Predictive Query by dropping rows that do not meet a specified condition. By using a WHERE
clause, you can drop entities that are irrelevant to your prediction or targets to exclude from your aggregation (i.e., when you are only interested in a subset of the table).
A condition following the WHERE
clause can also contain an aggregation, for cases where you may only want predictions on a certain subset of your entities. For example, the following Predictive Query predicts the lifetime value (LTV) for customers that were recently active, which is defined by whether they made a transactionn in the last 30 days:
PREDICT SUM(TRANSACTIONS.PRICE, 0, 30)
FOR EACH CUSTOMERS.CUSTOMER_ID WHERE COUNT(TRANSACTIONS.*,-30, 0) > 0
Using WHERE as an Inline Operation
The WHERE
operator can be used several times in different locations of the same Predictive Query. For example, the following is an example of using theWHERE
operator within an aggregation for target filters:
PREDICT COUNT(TRANSACTION.* WHERE TRANSACTION.VALUE > 10, 0, 7)
FOR EACH USER.USER_ID WHERE COUNT(TRANSACTION.*, -7, 0) > 0
This Predictive Query will predict the number of transactions over $10 each active user (i.e., a user that has made at least one purchase in the past 7 days) will make over the next 7 days.
You can use the same aggregation functions and start and end parameter definitions as targets; however, in these cases both start and end parameters should be non-positive.
Using Nested WHERE Filters
Kumo also supports nested temporal filters, for example:
PREDICT COUNT(transaction.*, 0, 7)
FOR EACH user.user_id
WHERE COUNT(transaction.*
WHERE transaction.value > 10,
-7, 0) > 0
This Predictive Query will predict the number of transactions over the next 7 days that each user who has made a purchase over $10 in the past 7 days will make.
Using Multiple Target Tables
Kumo also supports the use of multiple target tables, for example:
PREDICT COUNT(session.*, 0, 7) > 10 OR SUM (transaction.value, 0, 5) > 100
FOR EACH user.user_id
This Predictive Query will predict whether each user will either have more than 10 sessions in the next 7 days or make purchases exceeding $100 over the next 5 days.
You may use multiple target tables as long as all conditions are static or all conditions are temporal. For example, the following query
PREDICT COUNT(session.*, 0, 7) > 10 OR transaction.value > 2
FOR EACH user.user_id
is not allowed since it mixes static and temporal clauses.
Using Specific Date/Time Values with WHERE
Specific date and time values can also be included in WHERE
clauses. For example, the following Predictive Query excludes customers that have joined after January 1st, 2022 12:45:30
:
PREDICT COUNT(TRANSACTIONS.*, 0, 10)
FOR EACH CUSTOMERS.CUSTOMER_ID
WHERE CUSTOMERS.DATE_JOINED < 2022-01-01 12:45:30
Dates should not be enclosed in quotes, and hours/minutes/seconds are optional and can be left out if only (in which case they are assumed to be 0
).
A static filter is a filter that does not contain any aggregations. You can combine multiple conditions together with AND/OR/NOT operators; however, unlike when specifying multiple target tables, mixing static and temporal conditions is allowed when used inside a filter. Additionally, a temporal filter can only be used when the target definition itself contains a temporal aggregation.
Updated 3 days ago