HomeDocsAPI Reference
Kumo.ai
Docs

WHERE

WHERE <Aggregation_Function>(<table>.<column_name>, <start>, <end>) <comparison_operator> <constant> (Optional)

Description

When you are only interested in a subset of the table, you can use WHERE operation to drop rows that do not meet the specified condition. This can be used to drop entities that are irrelevant to your prediction or targets that you do not want to include in your aggregation.

Static filter

When a filter does not contain any aggregations, we call it a static filter.

Example 1 - the basics

The syntax and some examples for this part of the query is as follows:

WHERE <entity_table>.<column_name> <comparison_operator> <constant>
WHERE user.country = 'US'

Example 2 - use inside of the query

One can use a static filter to remove unrequired entities (e.g. users that are not in the US), or undesired targets (e.g. transactions under a certain value), like so:

PREDICT COUNT(transaction.* WHERE transaction.amount > 100)
FOR EACH user.user_id WHERE user.country = 'US'

Example 3 - filtering based on connected tables

Sometimes, the data that we want to use for filtering is not in the same table. Suppose that we have tables user and region, with the user table containing a foreign key pointing to the table region. Suppose that we only want to use users in regions with less than 10000 inhabitants.

user.user_id WHERE region.num_inhabitants < 10000

Important: The one-hop-away filters can only be used when there is a unique foreign key pointing from the entity table and to the table with the condition column. In the above case, there needs to be a unique foreign key in the user table, pointing to the primary key of table region. This is to guarantee that each user can be uniquely connected to exactly one region.

Example 4 - combining multiple filters

You can also choose to chain multiple filters of this kind together using AND/OR/NOT statements and parentheses, similar to the example shown earlier:

WHERE (user.user_country = 'US' OR region.num_inhabitants < 10000)
       AND (user.user_dietary = 'Vegetarian')

Temporal filter

Condition following WHERE can also contain an aggregation. For example, you will commonly only care about delivering predictions on a certain subset of your entities based on their recent activity, stored in a Fact Table that is directly linked in the Graph. For example, you may only care about predicting churn for customers that were recently active (not customers that have long already churned).

Example 5 - temporal filter

FOR EACH user.user_id WHERE COUNT(views.id, -30, 0) > 0
FOR EACH user.user_id WHERE LIST_DISTINCT(PURCHASES.item_category, -90, 0) CONTAINS 'Food'
FOR EACH user.user_id WHERE LAST(STATUS.status, -1, 0) = 'ACTIVE'
FOR EACH user.user_id WHERE COUNT(TRANSACTIONS.*, -INF, 0) > 0

Important: Temporal filters can only be used when there is a unique foreign key pointing from the aggregated table and to the entity table. Note that this is exactly the opposite from the example 3. This is because an entity table may be connected to multiple rows in the aggregated table.

Here, allowable aggregation functions and definitions of start and end parameters are the same as the ones under Target, except both start and end should be non-positive.

Allowable comparison operators are the same for target. As with other filters, you can combine multiple conditions together with AND/OR/NOT operators. Unlike in the target definition (PREDICTclause), mixing static and temporal conditions is perfectly valid when used inside the filter.

Finally, temporal filters can only be used when the target definition itself contains a temporal aggregation.

Changing filters for batch prediction

In certain situations, you want to make prediction for a different set of entities/targets than your model was trained for. For example, you may have trained your purchase-prediction model on all customers and all types of clothes, but right now, you are only interested in customers that have signed up recently, or customers that come from a specific region. To this end, you may change some of the filters just for batch prediction.

Example 6 - changing the entity filter at batch prediction time

Suppose that we trained a query, where we tried to drop all customers, annotated as inactive. Before batch prediction, the system will automatically detect any parts of the query that we may edit for batch prediction.

Editing the field in the right will update the query and drop not only entities that have been labelled as active, but also those without any transactions in the last 90 days.

Note: It is important to be mindful of the data distribution shift. Making predictions on a different set of data than the model has been trained on can result in unexpected quality degradation.

Example 7 - changing the target filter at batch prediction time

When using LIST_DISTINCT aggregation on a foreign key column, like in the above example, we may additionally change the target filter as well. Let us introduce this feature through an example. Let's say that we trained a model to predict how many trousers with price larger than $50 each customer will buy in the next three months:

PREDICT LIST_DISTINCT(transactions.article_id
                      WHERE articles.product_type_name = 'Trousers'
                            AND transactions.price >= 50,
                      0, 90, days)
FOR EACH user.user_id

The condition articles.product_type_name = 'Trousers' is using the one-hop-away table that the transactions.article_id foreign key is pointing to. This part of the condition can be both enforced and changed at batch prediction time. If left untouched, the system will ensure that only trousers are ranked in the output, ignoring any product in the articles table that does not meet the above condition.

By changing the filter to articles.product_type_name = 'Trousers' and articles.color = 'blue', the batch prediciton ranking will only include blue trousers. When changing the target filters like this, only references to the articles are permitted, that is, only the table that the aggregated foreign key is pointing to. If you, hypothetically, wanted to change transaction.price >= 50 to transaction.price >= 100, you would need to retrain the model entirely.