How can I mitigate the problem of data drift?
Kumo provides several mechanisms for mitigating the problem of data drift. During pQuery development and model training, Kumo provides data distribution drift statistics consisting of a similarity score and a comparison of column distribution. Similarly, the job details for each of your batch predictions also displays data distribution drift statistics—these metrics are crucial for detecting unexpected changes in the data used to generate your batch predictions.
If an upstream pipeline silently fails and changes your data, Kumo will detect this and show you a similarity score based on the Population Stability Index. If you see a column with a low similarity score, you can inspect the distributions to quickly find the root cause of the data drift.
Updated 7 months ago