Release Notes
Latest updates and improvements to the Kumo Platform
v1.41 (11/04/2024)
Improvements
- Improved Reliability for Backend Storage Systems
Expanded compatibility and enhanced reliability for diverse backend storage systems including AWS S3, Databricks Unity Catalog, and Snowflake Partner Connect Storage (SPCS) stage. - Optimized Efficiency in Training Data Materialization for Small Graphs
Significant improvements have been made in the processing efficiency of small graph data, reducing materialization time by up to 25%. - Entity selection in XAI for multi-class classification will now provide the top-k entities for which the model predictions are correct or wrong with high confidence
- Support for focal loss in binary/multi-label tasks
- Support for 6-hop neighbor sampling
- Early validation for Batch Prediction outputs written to warehouses.
- Baseline comparisons are now available Snowflake Native App for all task types.
- Added graph-based visualizations to display entity-level explainability for binary classification tasks
- Several UI improvements, like stats scale issues, fixing eval charts.
- Performance improvements for materialization (~30% reduction) using intermediate Snowflake tables instead of external parquet files in Snowflake Native app.
Breaking Changes
- Support for the old flat model plan YAML configuration has been fully removed.
v1.40 (October 2024)
Improvements
- Various improvements to the experience for writing Predictive Queries including Autocomplete and inline error hints!
- CPU requirements have been shrunk down massively, avoiding the likelihood of CPU OOMsReduced times for loading tables in trainer for SPCS by 30-40%.
- SPCS deployment no longer requires Snowflake connector credentials. Kumo uses the Snowflake-provided Oauth token already available in SPCS.
- Improved reliability for Databricks native deployments around UCV file uploads/downloads and session management.
- Various UI improvements to support Databricks connector.
Breaking Changes
- Batch Prediction output format transformations have been deprecated. To post-process predictions, we recommend using a distributed data processing platform like Databricks or AWS EMR Studio in your secure environment
- Batch Prediction data distribution drift statistics have been deprecated. We recommend using your MLOps platform to make sure that changes in the distribution of your data are intended
v1.39 (September 2024)
New Features
- New model plan option - sample_from_entity_table (default: True): This new option for static predictive queries allows to customize the behavior of neighborhood sampling. If set to False, it will disallow sampling of other entities in the entity table besides the seed entity itself. Useful in case entities represent candidates/hypothetical examples in order to restrict information flow between different candidates.
- Support for global baselines for all problems types, which now enables generating baselines for static link and node prediction
Improvements
- Baseline is additionally supported for temporal multilabel classification and ranking problems and all static problems.
- Baseline Triggering button is moved from model planner to a separate more visible button in the model planner page.
- Speed up batch prediction table generation with a large number of entities and timeframes. For some queries this brings down BP table generation time from 1 hour+ to under 20 minutes.
- Connector authentication for Snowflake Native app: When Kumo runs as a Snowflake native app, users no longer need to provide their credentials when creating a Snowflake connector; Kumo uses the built-in Oauth token in SPCS to connect to the customer’s warehouse. This change also ensures that all traffic between Kumo and Snowflake happens within Snowflake’s private network and Kumo no longer requires egress rules to connect to the customer’s Snowflake account. This change does require privileges to be granted to the Kumo native app before Kumo can access any data in the customer's Snowflake account.
Breaking Changes
- None
v1.38 (July 15, 2024)
- Baselines now supported in SPCS.
- Encrypted keys now supported for Snowflake connector.
- Backend performance enhancements.
- Various minor fixes and UI improvements.
v1.37 (June 2, 2024)
- Backend performance enhancements (SaaS)
- Various minor fixes and UI improvements.
v1.36 (May 27, 2024) - extended release notes
- Baselines page now displays a warning when a feature is not available.
- Users are now alerted if multi-class classifications only have two classes.
- Enhancements to in-app pQuery documentation and improved tooltips.
- Various minor fixes and UI improvements.
v1.35 (May 13, 2024) - extended release notes
- Kumo table and view creation now streamlined in a unified "Add Table/View" page.
- Newly refined UI across the Kumo SaaS app.
- Various minor fixes and UI improvements.
v1.34 (April 29, 2024) - extended release notes
- Multi-label ranking is now available in PQLv2.
- Encoder use can now be specified for autoregressive labels in regression and forecasting tasks (by specifying past_encoder in the model plan).
- Various backend performance enhancements and improvements.
- Various minor fixes and UI improvements.
v1.33 (April 11, 2024) - extended release notes
- Enhanced monitoring for batch predictions to detect unusual gaps in fact tables.
- For classification, link prediction, and regression tasks, heuristic baselines now available for comparing Kumo results to other baselines.
- Various backend performance enhancements and improvements.
- Various minor fixes and UI improvements.
v1.32 (March 25, 2024)
- Data distribution drift statistics now available for batch predictions.
- Row-level explainability (XAI) metrics now available via the explorer tab.
- Enhanced datatype changes are now available during preprocessing when creating tables.
- When setting up dimension tables, end date can now be set up to restrict training and batch predictions to a specific timeframe.
- Various minor fixes and UI improvements.
v1.31 (March 11, 2024)
- For ranking tasks (i.e., pqueries using LIST_DISTINCT with RANK TOP K), target item limit increased from 1M to 10M.
- For certain types of pQueries (e.g., link prediction tasks), an Explorer section is available for evaluating predictions against historical and ground truth data.
- Various minor fixes and UI improvements.
v1.30 (February 26, 2024)
- Improvements for supporting extensive batch prediction jobs.
- Various minor fixes and UI improvements.
v1.29 (February 15, 2024)
- Improvements to AWS S3 connector allow for CSV/Parquet support and broader scaling (more tables) capability.
- Various minor fixes and UI improvements.
v1.28 (February 1, 2024)
- Various backend improvements to performance during training.
- Various minor fixes and UI improvements.
v1.27 (January 15, 2024)
- Additional features and syntax available for link prediction tasks.
- MLOps monitoring dashboards available for batch prediction jobs.
- Various minor fixes and UI improvements.
v1.26 (December 18, 2023)
- The pquery syntax has been updated to make it easier to understand and more flexible in the way filters can be applied.
- Various minor fixes and UI improvements.
v1.25 (November 27, 2023)
- BigQuery now available as a batch prediction output.
v1.24 (November 13, 2023)
- New model planner available during pQuery training allows for fine-grained control over encoders, training strategy, and the AutoML search space.
- Additional model planner (previously advanced options) configuration options available
v1.23 (October 30, 2023)
- XAI: various minor fixes and UI improvements.
- XAI: metrics now available for multiclass and multilabel classification tasks
- For node prediction tasks, test data splits can now be downloaded from the Review Evaluation Metrics page.
- When selecting source tables, a new raw table option is available for connecting tables that don't conform to either fact or dimension table types.
- Kumo views enable the running of traditional SQL queries that materialize a view in the Kumo data plane.
v1.22 (October 16, 2023)
- Batch predictions now include output statistics computed from a sample of table data.
- Various minor fixes and UI improvements.
v1.21 (October 2, 2023)
- XAI - Cohort analysis for time columns now improved to be more interpretable.
- XAI - Cohort analysis now working for tables that are two hops away from the prediction entity table.
- A new refit feature enables automatic model refitting on entire data.
- Descriptions can now be added and updated for any objects in the Kumo platform
- During new pquery creation, automatically re-use already materialized graphs from prior pQuery creation jobs.
- A new connector is available for connecting to Google Cloud BigQuery.
- For multilabel classification pQueries (e.g. using the LIST_DISTINCT() operator on a maximum of 1,000 classes), evaluation metrics now include class-specific metrics.
v1.20 (September 18, 2023)
- XAI - In Column Analysis, actual versus predicted values are now displayed per column.
- A new table column type called Embedding enables the use of embeddings as an input column.
- For regression pQueries predicting a numeric output (using COUNT, SUM, etc. operators), evaluation results now include scatter plot charts that display actual versus predicted values.
- During pQuery training, charts and tables are now provided to show how the training example target labels used to train the pQuery vary over time and across training/validation/holdout data splits.
v1.19 (September 4, 2023)
- A “Distribution of Predictions” chart showcasing a visualization of the predicted values alongside the actual target labels for all entities in a regression task (e.g., predictive queries with COUNT() or SUM() operator)
- Expose boolean advanced option to handle prediction of unseen target entities at batch prediction time for link prediction tasks
- Creating custom Kumo Views using SQL queries on top of tables already connected to the platform
- Enable kicking off up to 10 asynchronous jobs (training/batch prediction) that will get queued and run sequentially one after another as older jobs complete
- Enable concurrent execution of more than 1 job
v1.18 (August 21, 2023)
- A plot showcasing the distribution of values for timestamp columns for validating while ingesting new tables
- S3 CSV data sources supported as connectors
- Calibrating batch predictions for classification tasks using Platt Scaling
- Parallelize batch prediction jobs involving large dataset size on multiple workers (up to 4)
- XAI - Explaining how the underlying data contributes to the final predictions
- Contribution score of individual tables and the columns within them
- Cohort analysis for the range of values of each column and for the range of number of historic facts available in tables
- Miscellaneous minor UX flow, bug, predictive accuracy fixes
Updated 13 days ago