WHERE
Description
When you are only interested in a subset of the table, you can use WHERE
operation to drop rows that do not meet the specified condition. This can be used to drop entities that are irrelevant to your prediction or targets that you do not want to include in your aggregation.
Static filter
When a filter does not contain any aggregations, we call it a static filter.
Example 1 - the basics
The syntax and some examples for this part of the query is as follows:
WHERE <entity_table>.<column_name> <comparison_operator> <constant>
WHERE user.country = 'US'
Example 2 - use inside of the aggregation function
One can use a static filter to focus on specific entities (e.g. users in the US) or targets (e.g. transactions beyond $100), ensuring that only the most relevant data is in your predictions, as follows:
PREDICT COUNT(transaction.* WHERE transaction.amount > 100)
FOR EACH user.user_id WHERE user.country = 'US'
Example 3 - one-hop-away filter: filtering based on connected tables
Sometimes, the data that we want to use for filtering is not in the same table as entities or targets. Suppose that we have tables user
and region
, with the user
table containing a foreign key pointing to the table region
. If we only want to use users in the regions with less than 10000
inhabitants, then the following filter can be used:
user.user_id WHERE region.num_inhabitants < 10000
Important: This one-hop-away filters can only be used when there is a unique foreign key pointing from the entity table to the table with the condition column. In the above case, there needs to be a unique foreign key in the user
table, pointing to the primary key of table region
. This is to guarantee that each user
can be uniquely connected to exactly one region
.
Example 4 - combining multiple filters
You can also choose to chain multiple filters of this kind together using AND/OR/NOT statements and parentheses, similar to the example shown earlier:
WHERE (user.user_country = 'US' OR region.num_inhabitants < 10000)
AND (user.user_dietary = 'Vegetarian')
Temporal filter
Condition following WHERE
can also contain an aggregation. For example, you will commonly only care about delivering predictions on a certain subset of your entities based on their recent activity, stored in a Fact Table that is directly linked in the Graph. For example, you may only care about predicting churn for customers that were recently active (not customers that have long already churned).
Example 5 - temporal filter
FOR EACH user.user_id WHERE COUNT(views.id, -30, 0) > 0
FOR EACH user.user_id WHERE LIST_DISTINCT(PURCHASES.item_category, -90, 0) CONTAINS 'Food'
FOR EACH user.user_id WHERE LAST(STATUS.status, -1, 0) = 'ACTIVE'
FOR EACH user.user_id WHERE COUNT(TRANSACTIONS.*, -INF, 0) > 0
Important: Temporal filters can only be used when there is a unique foreign key pointing from the aggregated table to the entity table. Note that this is exactly the opposite from the example 3. This is because an entity table may be connected to multiple rows in the aggregated table.
Here, allowable aggregation functions and definitions of start
and end
parameters are the same as the ones under Target, except both start
and end
should be non-positive.
Allowable comparison operators are the same for target. As with other filters, you can combine multiple conditions together with AND/OR/NOT operators. Unlike in the target definition (PREDICT
clause), mixing static and temporal conditions is perfectly valid when used inside the filter.
Finally, temporal filters can only be used when the target definition itself contains a temporal aggregation.
Changing filters for batch prediction
In certain situations, you want to make prediction for a different set of entities/targets than your model was trained for. For example, you may have trained your purchase-prediction model on all customers and all types of clothes, but right now, you are only interested in customers that have been active recently, or customers that come from a specific region. To this end, you may change some of the filters at batch prediction time while keeping the model the same.
Example 6 - changing the entity filter at batch prediction time
Suppose that we trained a query, where we tried to drop all customers, annotated as inactive. Before batch prediction, the system will automatically detect any parts of the query that we may edit for batch prediction.
Editing the field in the right will update the query and drop not only entities that have been labelled as active, but also those without any transactions in the last 90 days.
Note: It is important to be mindful of the data distribution shift. In particular, making predictions on different distributions of data that was not seen during training can result in unexpected quality degradation.
Example 7 - changing the target filter at batch prediction time
When using LIST_DISTINCT
aggregation on a foreign key column, like in the above example, we may additionally change the target filter as well. Let us introduce this feature through an example. Let's say that we trained a model to predict how many trousers with price larger than or equal to $50 each customer will buy in the next three months:
PREDICT LIST_DISTINCT(transactions.article_id
WHERE articles.product_type_name = 'Trousers'
AND transactions.price >= 50,
0, 90, days)
FOR EACH user.user_id
The condition articles.product_type_name = 'Trousers'
is using the one-hop-away table that the transactions.article_id
foreign key is pointing to. This part of the condition can be both enforced and changed at batch prediction time. If left untouched, the system will ensure that only trousers are ranked in the output, ignoring any product in the articles
table that does not meet the above condition.
By changing the filter to articles.product_type_name = 'Trousers' and articles.color = 'blue'
, the batch prediction ranking will only include blue trousers. When changing the target filters like this, only references to the articles
are permitted, that is, only the table that the aggregated foreign key is pointing to. If you, hypothetically, wanted to change transaction.price >= 50
to transaction.price >= 100
, you would need to retrain the model entirely.
Updated 3 days ago