Description
A query with a LIST_DISTINCT
aggregation target can serve two different targets: ranking or classification. The same applies to queries with multicategorical or multilabel target columns.
When predicting which products a user will buy, you are usually only interested in the ranking of the top few products that the user is most likely to buy. In that case, you can guarantee that by adding RANK TOP K
at the end of your target definition, where K
is the number of items that you are interested in.
On the other hand, you might be interested in a separate prediction for each item, making a separate binary prediction for each item type. To use this feature, add CLASSIFY
to the end of your target definition.
Example
The examples for this part of the query can be found here:
PREDICT LIST_DISTINCT(transaction.article_id, 0, 30) RANK TOP 12
PREDICT LIST_DISTINCT(transaction.article_id, 0, 30) CLASSIFY
PREDICT target.multicategorical_column RANK TOP 20
The two operations are subject to different limits: ranking works up to 10,000,000
different entities, while classification only works with up to 1000
different entities. Ranking at most 1000
targets is permitted.
TOP K
will be ignored if used with CLASSIFY
. Adding CLASSIFY
/RANK
is required if the target output is LIST_DISTINCT
or a multicategorical column.
CLASSIFY
/RANK
is not required and has no effect if LIST_DISTINCT
appears as part of a condition, such as in the following pQuery:
PREDICT LIST_DISTINCT(transaction.category, 0, 30) CONTAINS "online"
Note: For predictions per entity value in batch predictions that differ from the initial pQuery’s RANK TOP K value, Kumo uses the same trained model but produces the number of results specified at batch prediction time
Updated 3 days ago