Release Notes
Latest updates and improvements to the Kumo Platform
v1.45 Release Notes (01/06/2025)
Improvements
UI Enhancements
- Adjusted local file uploads to process chunks asynchronously, completing uploads efficiently.
Error Handling and Messaging
- Enhanced error messages for batch predictions in BigQuery, providing detailed customer-facing explanations instead of runtime errors.
- Improved error messaging for failed training jobs caused by empty ingested data, ensuring clarity on CSV ingestion issues.
Integrations:
- Kumo can now be run as a Native app on Snowflake Azure regions.
Bug Fixes
- Resolved bar graph display issues within the Subgraph table.
Breaking Change
- Reduced download limit of holdout dataset to 1M entities.
v1.44 Release Notes (12/18/2024)
Improvements
UI Enhancements
- BP Job Creation: Fixed warnings during job creation.
Training & Database Optimizations
- Temporal Queries: Enhanced training table generation performance when static entity filters are applied.
- Finegrained sampling config: Number of neighbors can now be specified per edge type.
API and Framework Updates
- Kumo-ML Update: Upgraded to the latest version for enhanced stability.
- Session Creation Retry: Enabled automatic retries in Databricks mode.
Spark and SPCS
- Spark Tracking: Enabled by default for streamlined debugging.
- SPCS Efficiency: Enhanced file copy processes for faster data handling.
Bug Fixes
- Worker Health Check: Improved warning messages and thresholds.
- Subgraph Size: Enhanced error messaging for large subgraph scenarios.
- XAI Table Display: Fixed table name visibility issues in the Explainability UI.
v1.43 Release Notes (12/05/2024)
New Features: SDK
Kumo version v1.43 introduces the Kumo Python SDK, a fully fledged job-centric, composable, and interactive programmatic interface to the Kumo machine learning platform. The SDK allows users to perform EDA, create tables, graphs & queries, train jobs, evaluate results, and orchestrate production jobs all in one notebook environment. Key features of the SDK include:
- A Python-friendly object model representing the key components of a relational deep learning model:
Connector
,Table
,Graph
,PredictiveQuery
, andTrainer
. - A composable interface allowing users to inspect, evaluate, and modify intermediate artifacts in the Kumo pipeline (e.g., training and prediction tables, holdout dataframes, and more).
- A new user interface to display all launched jobs, monitor job progress, and visualize outputs including training progress, evaluations, and explainability.
The SDK can be installed with “ “pip install kumoai==0.2.1 --extra-index-url=https://sdk-pkg.kumoai.cloud" and is documented at https://kumo-ai.github.io/kumo-sdk/docs.
Improvements
UI and Jobs Page Enhancements
- Adjusted search text box corners and reduced row padding.
- Improved dropdown designs, tags theme, and column interactions.
- Enhanced column width and hover behaviors for table elements.
- Renamed components for better clarity (e.g., "Jobs Overview" updated).
- Improved styling and content for job training details and related pages.
Snowflake Native App Improvements
- Kumo’s Snowflake Native app is now more cost-efficient! The control plane runs on a CPU-only compute pool, utilizing a GPU compute pool only during model training, resulting in nearly 3X to 4X cost savings.
- Improved reliability for Snowflake stage operations.
Performance and Reliability Fixes
- Call Cache Directory Creation: Transitioned file IO operations to temporal activities to prevent deadlocks.
- Batch Prediction Optimization: Fixed global materialization logic in batch jobs.
- Prediction Anchor Time Visibility: Updated prediction table to display anchor time only when available.
Model Plan Improvements
- Added support for finer-grained control of sampling, allowing granularity at the level of
fkeys
andhops
.
Graph and Visualization Improvements
- Updated styles, icons, and spacing for better clarity.
- Resolved visualization issues like "fitView" bugs and enhanced graph interactivity.
- Added related jobs and nested jobs enhancements.
- Improved graph snapshots and introduced spacing refinements.
Training Table Improvements
- Made the "Timeframe Chart" default on page load and removed redundant elements.
- Added new columns and updated styling to match current requirements.
- Set more intuitive defaults for regression task parameters.
Bug Fixes
- UI Interaction Refinements: Fixed flickering divider issues and softened shadows.
- Resolved inconsistencies in column width and hover effects.
- Updated progress bar behavior and fixed reload issues.
Breaking Changes
Deprecations and Feature Updates
AdvancedAutoTrainerOptions
has been deprecated in favor ofModelPlan
.- Removed the
max_target_neighbors_per_entity
option.
v1.42 (11/20/2024)
Improvements
Enhanced Migration Documentation and Database Improvements
- Updated Migration Readme: Simplified guidance for migration processes, ensuring clear and actionable steps for smoother transitions.
- Schema Migration Enhancements: Improved ML database migrations by fixing database connection strings for greater reliability and ease of use.
- Spark Resource Table Addition: Enabled advanced data analytics by adding the spark_resource table to the production ML database, enhancing data processing capabilities.
Predictive (and Forecasting) Query Refinements
- Predictive Query User Experience: Removed unnecessary pop-ups from the Predictive Query overview, streamlining user interactions and reducing distractions.
Performance and Efficiency Upgrades
- Batch Prediction Optimization: Reduced prediction time by 20-50% for large-scale partitioned batch predictions, saving time on data processing.
- Categorical Data Analysis: Fixed missing percentage stats for categorical columns with more than 20 categories, providing comprehensive insights.
Explainability and UI Improvements
- XAI Enhancements: Simplified explainability visuals by updating graph details and tips for easier interpretation of predictive models:
- Updated graph origins to display prediction averages.
- Streamlined visuals by removing population fraction indicators from graphs and showing them as text instead.
Bug Fixes and Reliability
- Job Type Search Fix: Resolved job type filtering issues, ensuring accurate search results when filtering by job attributes.
- Split Horizon Adjustment: Corrected computations to ensure consistent and accurate results in horizon splitting.
Graph and Node Visualization Improvements
- Node Graph Adjustments: Improved the visibility of nodes in large graphs by dynamically resizing node dimensions based on graph size.
v1.41 (11/04/2024)
Improvements
- Improved Reliability for Backend Storage Systems
Expanded compatibility and enhanced reliability for diverse backend storage systems including AWS S3, Databricks Unity Catalog, and Snowflake Partner Connect Storage (SPCS) stage. - Optimized Efficiency in Training Data Materialization for Small Graphs
Significant improvements have been made in the processing efficiency of small graph data, reducing materialization time by up to 25%. - Entity selection in XAI for multi-class classification will now provide the top-k entities for which the model predictions are correct or wrong with high confidence
- Support for focal loss in binary/multi-label tasks
- Support for 6-hop neighbor sampling
- Early validation for Batch Prediction outputs written to warehouses.
- Baseline comparisons are now available Snowflake Native App for all task types.
- Added graph-based visualizations to display entity-level explainability for binary classification tasks
- Several UI improvements, like stats scale issues, fixing eval charts.
- Performance improvements for materialization (~30% reduction) using intermediate Snowflake tables instead of external parquet files in Snowflake Native app.
Breaking Changes
- Support for the old flat model plan YAML configuration has been fully removed.
v1.40 (October 2024)
Improvements
- Various improvements to the experience for writing Predictive Queries including Autocomplete and inline error hints!
- CPU requirements have been shrunk down massively, avoiding the likelihood of CPU OOMsReduced times for loading tables in trainer for SPCS by 30-40%.
- SPCS deployment no longer requires Snowflake connector credentials. Kumo uses the Snowflake-provided Oauth token already available in SPCS.
- Improved reliability for Databricks native deployments around UCV file uploads/downloads and session management.
- Various UI improvements to support Databricks connector.
Breaking Changes
- Batch Prediction output format transformations have been deprecated. To post-process predictions, we recommend using a distributed data processing platform like Databricks or AWS EMR Studio in your secure environment
- Batch Prediction data distribution drift statistics have been deprecated. We recommend using your MLOps platform to make sure that changes in the distribution of your data are intended
v1.39 (September 2024)
New Features
- New model plan option - sample_from_entity_table (default: True): This new option for static predictive queries allows to customize the behavior of neighborhood sampling. If set to False, it will disallow sampling of other entities in the entity table besides the seed entity itself. Useful in case entities represent candidates/hypothetical examples in order to restrict information flow between different candidates.
- Support for global baselines for all problems types, which now enables generating baselines for static link and node prediction
Improvements
- Baseline is additionally supported for temporal multilabel classification and ranking problems and all static problems.
- Baseline Triggering button is moved from model planner to a separate more visible button in the model planner page.
- Speed up batch prediction table generation with a large number of entities and timeframes. For some queries this brings down BP table generation time from 1 hour+ to under 20 minutes.
- Connector authentication for Snowflake Native app: When Kumo runs as a Snowflake native app, users no longer need to provide their credentials when creating a Snowflake connector; Kumo uses the built-in Oauth token in SPCS to connect to the customer’s warehouse. This change also ensures that all traffic between Kumo and Snowflake happens within Snowflake’s private network and Kumo no longer requires egress rules to connect to the customer’s Snowflake account. This change does require privileges to be granted to the Kumo native app before Kumo can access any data in the customer's Snowflake account.
Breaking Changes
- None
v1.38 (July 15, 2024)
- Baselines now supported in SPCS.
- Encrypted keys now supported for Snowflake connector.
- Backend performance enhancements.
- Various minor fixes and UI improvements.
v1.37 (June 2, 2024)
- Backend performance enhancements (SaaS)
- Various minor fixes and UI improvements.
v1.36 (May 27, 2024) - extended release notes
- Baselines page now displays a warning when a feature is not available.
- Users are now alerted if multi-class classifications only have two classes.
- Enhancements to in-app pQuery documentation and improved tooltips.
- Various minor fixes and UI improvements.
v1.35 (May 13, 2024) - extended release notes
- Kumo table and view creation now streamlined in a unified "Add Table/View" page.
- Newly refined UI across the Kumo SaaS app.
- Various minor fixes and UI improvements.
v1.34 (April 29, 2024) - extended release notes
- Multi-label ranking is now available in PQLv2.
- Encoder use can now be specified for autoregressive labels in regression and forecasting tasks (by specifying past_encoder in the model plan).
- Various backend performance enhancements and improvements.
- Various minor fixes and UI improvements.
v1.33 (April 11, 2024) - extended release notes
- Enhanced monitoring for batch predictions to detect unusual gaps in fact tables.
- For classification, link prediction, and regression tasks, heuristic baselines now available for comparing Kumo results to other baselines.
- Various backend performance enhancements and improvements.
- Various minor fixes and UI improvements.
v1.32 (March 25, 2024)
- Data distribution drift statistics now available for batch predictions.
- Row-level explainability (XAI) metrics now available via the explorer tab.
- Enhanced datatype changes are now available during preprocessing when creating tables.
- When setting up dimension tables, end date can now be set up to restrict training and batch predictions to a specific timeframe.
- Various minor fixes and UI improvements.
v1.31 (March 11, 2024)
- For ranking tasks (i.e., pqueries using LIST_DISTINCT with RANK TOP K), target item limit increased from 1M to 10M.
- For certain types of pQueries (e.g., link prediction tasks), an Explorer section is available for evaluating predictions against historical and ground truth data.
- Various minor fixes and UI improvements.
v1.30 (February 26, 2024)
- Improvements for supporting extensive batch prediction jobs.
- Various minor fixes and UI improvements.
v1.29 (February 15, 2024)
- Improvements to AWS S3 connector allow for CSV/Parquet support and broader scaling (more tables) capability.
- Various minor fixes and UI improvements.
v1.28 (February 1, 2024)
- Various backend improvements to performance during training.
- Various minor fixes and UI improvements.
v1.27 (January 15, 2024)
- Additional features and syntax available for link prediction tasks.
- MLOps monitoring dashboards available for batch prediction jobs.
- Various minor fixes and UI improvements.
v1.26 (December 18, 2023)
- The pquery syntax has been updated to make it easier to understand and more flexible in the way filters can be applied.
- Various minor fixes and UI improvements.
v1.25 (November 27, 2023)
- BigQuery now available as a batch prediction output.
v1.24 (November 13, 2023)
- New model planner available during pQuery training allows for fine-grained control over encoders, training strategy, and the AutoML search space.
- Additional model planner (previously advanced options) configuration options available
v1.23 (October 30, 2023)
- XAI: various minor fixes and UI improvements.
- XAI: metrics now available for multiclass and multilabel classification tasks
- For node prediction tasks, test data splits can now be downloaded from the Review Evaluation Metrics page.
- When selecting source tables, a new raw table option is available for connecting tables that don't conform to either fact or dimension table types.
- Kumo views enable the running of traditional SQL queries that materialize a view in the Kumo data plane.
v1.22 (October 16, 2023)
- Batch predictions now include output statistics computed from a sample of table data.
- Various minor fixes and UI improvements.
v1.21 (October 2, 2023)
- XAI - Cohort analysis for time columns now improved to be more interpretable.
- XAI - Cohort analysis now working for tables that are two hops away from the prediction entity table.
- A new refit feature enables automatic model refitting on entire data.
- Descriptions can now be added and updated for any objects in the Kumo platform
- During new pquery creation, automatically re-use already materialized graphs from prior pQuery creation jobs.
- A new connector is available for connecting to Google Cloud BigQuery.
- For multilabel classification pQueries (e.g. using the LIST_DISTINCT() operator on a maximum of 1,000 classes), evaluation metrics now include class-specific metrics.
v1.20 (September 18, 2023)
- XAI - In Column Analysis, actual versus predicted values are now displayed per column.
- A new table column type called Embedding enables the use of embeddings as an input column.
- For regression pQueries predicting a numeric output (using COUNT, SUM, etc. operators), evaluation results now include scatter plot charts that display actual versus predicted values.
- During pQuery training, charts and tables are now provided to show how the training example target labels used to train the pQuery vary over time and across training/validation/holdout data splits.
v1.19 (September 4, 2023)
- A “Distribution of Predictions” chart showcasing a visualization of the predicted values alongside the actual target labels for all entities in a regression task (e.g., predictive queries with COUNT() or SUM() operator)
- Expose boolean advanced option to handle prediction of unseen target entities at batch prediction time for link prediction tasks
- Creating custom Kumo Views using SQL queries on top of tables already connected to the platform
- Enable kicking off up to 10 asynchronous jobs (training/batch prediction) that will get queued and run sequentially one after another as older jobs complete
- Enable concurrent execution of more than 1 job
v1.18 (August 21, 2023)
- A plot showcasing the distribution of values for timestamp columns for validating while ingesting new tables
- S3 CSV data sources supported as connectors
- Calibrating batch predictions for classification tasks using Platt Scaling
- Parallelize batch prediction jobs involving large dataset size on multiple workers (up to 4)
- XAI - Explaining how the underlying data contributes to the final predictions
- Contribution score of individual tables and the columns within them
- Cohort analysis for the range of values of each column and for the range of number of historic facts available in tables
- Miscellaneous minor UX flow, bug, predictive accuracy fixes
Updated about 8 hours ago