Description
Defines the numerical offset from the most recent entry to use to generate training data labels. Unless a custom time unit is specified in the aggregation, this value is in days
. This can be used to make sure the query only generates labels on data from the last train_start_offset
days. Regardless of this value, all data is used as an input to the model, but this value can help limit what labels are generated.
train_start_offset
must be >0
.
Supported Task Types
- Temporal
Example 1
For example, you may want to only use training examples for customers that churned in the last year, but those customers may have 10 years of data that we will use for training the model.
train_start_offset: <integer>
train_start_offset: 10 # Only train on data from the last 10 days
train_start_offset: 365 # Only train on data from the last year
This only applies to temporal queries (queries that include a temporal aggregation such as SUM(TRANSACTIONS.AMOUNT, 0, 2, days)
) The unit of this step is the same as the unit in the aggregation.
Example 2
For example, for the query
PREDICT SUM(transactions.price, 0, 30, days)
FOR EACH customers.customer_id
The value of train_start_offset
will be in days
.
For example, if set to 10
, the training table will only include entries from the last 10 days.
Default Values
run_mode | Default Value |
---|---|
FAST | 0 |
NORMAL | 0 |
BEST | 0 |
Updated 3 months ago