ASSUMING
Description
To investigate hypothetical scenarios and evaluate impact of your actions or decisions, you can use the ASSUMING
keyword. For example, you may want to investigate how much a user will spend if you give them a certain coupon or notification.
ASSUMING
keyword is followed by a future-looking assumption, which will be assumed to be true during predictions.
How does it work?
When generating the training table, we apply the ASSUMING operator to all entities. This means that we only train on entities that meet the ASSUMING operator at each given timeframe. This biases the model to specialize on generating predictions for entities that are going to perform a certain action, and the model will use that knowledge at prediction time.
For example, if your predictive problem is to forecast the LTV of a customer, then we will generate training examples for all customers across the timeframes generated by your split configuration. If we instead assume that they get sent an email notification in the next 3 days, then we will filter down the training table to only entities that meet this criteria.
At Batch Prediction time, we don't know which customers will actually receive an email notification in the next 3 days, but a model trained on the unfiltered data won't be able to take that into account, while the model trained on the filtered data will generate predictions assuming that the user will receive an email notification.
Example
The syntax and some examples for this part of the query is as follows:
ASSUMING <aggregation_function>(<fact_table>.<column_name>, <start>, <end>) <comparison_operator> <constant>
ASSUMING COUNT(NOTIFICATIONS.*, 0, 7) > 2
ASSUMING LIST_DISTINCT(COUPONS.type, 0, 3) CONTAINS '50 percent off'
ASSUMING COUNT(NOTIFICATIONS.*, 0, 7) > 5 AND SUM(NOTIFICATIONS.LENGTH, 0, 7) > 10
Here, allowable aggregation functions and definitions of start
and end
parameters are the same as the ones under target and temporal entity filter, except both start
and end
should be non-negative. Also, remember that <fact_table> should include a key column linking it to the entity table's PK column.
Moreover, the assumption should be true often enough across the past data - it is hard to predict how users react to notifications if there is little information about past notifications in the database. Using a longer time window generally helps with this.
Allowable boolean operators are the same as for WHERE
filter.
Updated 3 days ago