How can I diagnose problems with my data pipeline at batch prediction time?

Suggest Edits

Kumo provides several tools for verifying the accuracy of your batch predictions and diagnosing potential issues with your data pipeline. You can view the details for each batch prediction job, including output statistics computed from a sample of table data based on your prediction task type (e.g., regression, binary classification, multi-class classification, multi-label classification, link prediction).

The job details for each of your batch predictions also displays data distribution drift statistics—these metrics are crucial for detecting unexpected changes in the data used to generate your batch predictions. If an upstream pipeline silently fails and changes your data, Kumo will detect this and show you a similarity score based on the Population Stability Index. If you see a column with a low similarity score, you can inspect the distributions to quickly find the root cause of the data drift.

More Information:

Updated over 1 year ago