What are embedding outputs?
Common questions on how to use Kumo embedding outputs in batch predictions.
Embedding outputs are another type of output supported by Kumo batch predictions. Instead of directly generating predicted target values, you can choose to obtain the embeddings for the entities. Embeddings are informative representations for the entities in the graph; after training completes, similar entities will have similar representations.
Embeddings can be used out of the box to represent the same entities in a variety of downstream tasks. For example, one use case of using embedding outputs is for downstream clustering and segmentation jobs. You might have trained a pQuery for a customer churn problem, and saved the embedding outputs that encode information for each customer. If you later want to do customer analysis on which customers are similar, this can be framed into an unsupervised clustering problem. Clustering algorithms require an informative vector representation for each customer, then clusters nearby vectors to the same cluster. Coming up with an informative vector representation takes feature engineering effort, but you can skip such engineering using customer embedding outputs directly as a starting point.
Another example is that you might have trained two pQueries separately, the entities in one are customers and the other are articles. Then based on a customer's historical buying habits, you can use the output embeddings to do recommendations with simple vector distance calculation. For example, customer C1 might like to buy article A1, and A2 is very close to A1 in the article embedding space; it is therefore likely that C1 also likes A2.
To use embedding outputs when creating batch predictions, click the EMBEDDINGS checkbox at the bottom of the Create a Batch Prediction page.
You will need to specify an S3 output bucket, a file name, and file type to output.
Learn More:
Updated 8 months ago