Release Notes

v2.0 Release (02/14/2025)

Please check our our new docs for Release Notes for V.2.0 and above https://kumo.ai/docs/overview

v1.47 Release Notes (01/31/2025)

UI Enhancements

Improved table styling and updated graph table UI for better clarity.

Graph creation enhancements:

Suggested primary-key and foreign-key links for new graphs.
Enabled editing tables during graph creation.
Fixed improper links appearing in new graphs.
Fixed sidebar graph highlighting when opening a new graph page.
Added Save button in Graph Table Edit.
Disabled connector name editing.
Improved sorting in tables list and set the last used connector as the default.
Adjusted table borders in Jobs page for better visibility.

Prediction Job UI Updates:

Integrated XAI explanation in Prediction Job overview.
Fixed issues with split stats null access.
Show "No Jobs" message for invalid searches.
Minor fixes for prediction job UI.
Updated file upload icon.
Removed hidden fields from default model plan.
Fixed column editing issues when incorrectly configured.
Implemented a fix to expand table rows when clicking dropdown buttons.
Removed CSV and Parquet upload connector dropdown.

Batch Prediction & Training Enhancements

Added support for Entity-Level and Graph-Level XAI in Batch Prediction Jobs page.
Implemented baseline metric retrieval for training workflows.
Fixed handling of large negative Unix timestamps.
Allowed graph_id parameter in Batch Prediction Jobs.
Fixed batch prediction anchor time retrieval in UI.
Raise error if Parquet partitions are too large when fetching table samples.

System and API Fixes

Fixed local mode LLM embedding decryption.
Improved artifact upload to Snowflake and databases.
Ensured files always overwrite when uploading.
Fix for rel time potential gradient error and updated ML version.
Adjusted trainer and gradient explainer for compatibility with time changes in kumo-ml.
Prevented continuous API calls in the Prediction Jobs page.
Added creation date for graph snapshots.

Bug Fixes & Stability Improvements

Fixed a validation bug for pquery input from frontend.
Fixed an issue where tables uploaded through local connectors couldn't be used for graph creation.
Fixed support for string literals with colons in TimeRangeSplit.

v1.46 Release Notes (01/17/2025)

Improvements

Forecasting

Model Plan Enhancements: Added year-over-year and handle_new_entities option labels for forecasting, aiding in model learning of seasonality and holiday trends.

LINKS
https://docs.kumo.ai/docs/year_over_year
https://docs.kumo.ai/docs/handle_new_entities

Error Handling

Improved syntax error messaging in Model Plan for better clarity.

UI Enhancements

Introduced a new page for managing graphs and table lists.
Enabled pagination on the jobs page to enhance performance and reduce backend load.
Improved the training job creation page interface.

Feature Improvements

Extended support for long-duration training jobs and batch predictions (2 to 20 days).
Enabled keyword searches on the jobs page with server-side support.

API Updates

Made the name field optional for the PQueryResource in the Kumo API.

Bug Fixes

Fixed timestamp data type casting issues, After this fix the user doesn't need to specify the ts_format or unit for that particular dataset.
Corrected prediction time estimates for large output sizes.

v1.45 Release Notes (01/06/2025)

Improvements

UI Enhancements

Adjusted local file uploads to process chunks asynchronously, completing uploads efficiently.

Error Handling and Messaging

Enhanced error messages for batch predictions in BigQuery, providing detailed customer-facing explanations instead of runtime errors.
Improved error messaging for failed training jobs caused by empty ingested data, ensuring clarity on CSV ingestion issues.

Integrations:

Kumo can now be run as a Native app on Snowflake Azure regions.

Bug Fixes

Resolved bar graph display issues within the Subgraph table.

Breaking Change

Reduced download limit of holdout dataset to 1M entities.

v1.44 Release Notes (12/18/2024)

Improvements

UI Enhancements

BP Job Creation: Fixed warnings during job creation.

Training & Database Optimizations

Temporal Queries: Enhanced training table generation performance when static entity filters are applied.
Finegrained sampling config: Number of neighbors can now be specified per edge type.

API and Framework Updates

Kumo-ML Update: Upgraded to the latest version for enhanced stability.
Session Creation Retry: Enabled automatic retries in Databricks mode.

Spark and SPCS

Spark Tracking: Enabled by default for streamlined debugging.
SPCS Efficiency: Enhanced file copy processes for faster data handling.

Bug Fixes

Worker Health Check: Improved warning messages and thresholds.
Subgraph Size: Enhanced error messaging for large subgraph scenarios.
XAI Table Display: Fixed table name visibility issues in the Explainability UI.

v1.43 Release Notes (12/05/2024)

New Features: SDK

Kumo version v1.43 introduces the Kumo Python SDK, a fully fledged job-centric, composable, and interactive programmatic interface to the Kumo machine learning platform. The SDK allows users to perform EDA, create tables, graphs & queries, train jobs, evaluate results, and orchestrate production jobs all in one notebook environment. Key features of the SDK include:

A Python-friendly object model representing the key components of a relational deep learning model: Connector, Table, Graph, PredictiveQuery, and Trainer.
A composable interface allowing users to inspect, evaluate, and modify intermediate artifacts in the Kumo pipeline (e.g., training and prediction tables, holdout dataframes, and more).
A new user interface to display all launched jobs, monitor job progress, and visualize outputs including training progress, evaluations, and explainability.

The SDK can be installed with “ “pip install kumoai==0.2.1 --extra-index-url=https://sdk-pkg.kumoai.cloud" and is documented at https://kumo-ai.github.io/kumo-sdk/docs.

Improvements

UI and Jobs Page Enhancements

Adjusted search text box corners and reduced row padding.
Improved dropdown designs, tags theme, and column interactions.
Enhanced column width and hover behaviors for table elements.
Renamed components for better clarity (e.g., "Jobs Overview" updated).
Improved styling and content for job training details and related pages.

Snowflake Native App Improvements

Kumo’s Snowflake Native app is now more cost-efficient! The control plane runs on a CPU-only compute pool, utilizing a GPU compute pool only during model training, resulting in nearly 3X to 4X cost savings.
Improved reliability for Snowflake stage operations.

Performance and Reliability Fixes

Call Cache Directory Creation: Transitioned file IO operations to temporal activities to prevent deadlocks.
Batch Prediction Optimization: Fixed global materialization logic in batch jobs.
Prediction Anchor Time Visibility: Updated prediction table to display anchor time only when available.

Model Plan Improvements

Added support for finer-grained control of sampling, allowing granularity at the level of fkeys and hops.

Graph and Visualization Improvements

Updated styles, icons, and spacing for better clarity.
Resolved visualization issues like "fitView" bugs and enhanced graph interactivity.
Added related jobs and nested jobs enhancements.
Improved graph snapshots and introduced spacing refinements.

Training Table Improvements

Made the "Timeframe Chart" default on page load and removed redundant elements.
Added new columns and updated styling to match current requirements.
Set more intuitive defaults for regression task parameters.

Bug Fixes

UI Interaction Refinements: Fixed flickering divider issues and softened shadows.
Resolved inconsistencies in column width and hover effects.
Updated progress bar behavior and fixed reload issues.

Breaking Changes

Deprecations and Feature Updates

AdvancedAutoTrainerOptions has been deprecated in favor of ModelPlan.
Removed the max_target_neighbors_per_entity option.

v1.42 (11/20/2024)

Improvements

Enhanced Migration Documentation and Database Improvements

Updated Migration Readme: Simplified guidance for migration processes, ensuring clear and actionable steps for smoother transitions.
Schema Migration Enhancements: Improved ML database migrations by fixing database connection strings for greater reliability and ease of use.
Spark Resource Table Addition: Enabled advanced data analytics by adding the spark_resource table to the production ML database, enhancing data processing capabilities.

Predictive (and Forecasting) Query Refinements

Predictive Query User Experience: Removed unnecessary pop-ups from the Predictive Query overview, streamlining user interactions and reducing distractions.

Performance and Efficiency Upgrades

Batch Prediction Optimization: Reduced prediction time by 20-50% for large-scale partitioned batch predictions, saving time on data processing.
Categorical Data Analysis: Fixed missing percentage stats for categorical columns with more than 20 categories, providing comprehensive insights.

Explainability and UI Improvements

XAI Enhancements: Simplified explainability visuals by updating graph details and tips for easier interpretation of predictive models:
Updated graph origins to display prediction averages.
Streamlined visuals by removing population fraction indicators from graphs and showing them as text instead.

Bug Fixes and Reliability

Job Type Search Fix: Resolved job type filtering issues, ensuring accurate search results when filtering by job attributes.
Split Horizon Adjustment: Corrected computations to ensure consistent and accurate results in horizon splitting.

Graph and Node Visualization Improvements

Node Graph Adjustments: Improved the visibility of nodes in large graphs by dynamically resizing node dimensions based on graph size.

v1.41 (11/04/2024)

Improvements

Improved Reliability for Backend Storage Systems
Expanded compatibility and enhanced reliability for diverse backend storage systems including AWS S3, Databricks Unity Catalog, and Snowflake Partner Connect Storage (SPCS) stage.
Optimized Efficiency in Training Data Materialization for Small Graphs
Significant improvements have been made in the processing efficiency of small graph data, reducing materialization time by up to 25%.
Entity selection in XAI for multi-class classification will now provide the top-k entities for which the model predictions are correct or wrong with high confidence
Support for focal loss in binary/multi-label tasks
Support for 6-hop neighbor sampling
Early validation for Batch Prediction outputs written to warehouses.
Baseline comparisons are now available Snowflake Native App for all task types.
Added graph-based visualizations to display entity-level explainability for binary classification tasks
Several UI improvements, like stats scale issues, fixing eval charts.
Performance improvements for materialization (~30% reduction) using intermediate Snowflake tables instead of external parquet files in Snowflake Native app.

Breaking Changes

Support for the old flat model plan YAML configuration has been fully removed.

v1.40 (October 2024)

Improvements

Various improvements to the experience for writing Predictive Queries including Autocomplete and inline error hints!
CPU requirements have been shrunk down massively, avoiding the likelihood of CPU OOMsReduced times for loading tables in trainer for SPCS by 30-40%.
SPCS deployment no longer requires Snowflake connector credentials. Kumo uses the Snowflake-provided Oauth token already available in SPCS.
Improved reliability for Databricks native deployments around UCV file uploads/downloads and session management.
Various UI improvements to support Databricks connector.

Breaking Changes

Batch Prediction output format transformations have been deprecated. To post-process predictions, we recommend using a distributed data processing platform like Databricks or AWS EMR Studio in your secure environment
Batch Prediction data distribution drift statistics have been deprecated. We recommend using your MLOps platform to make sure that changes in the distribution of your data are intended

v1.39 (September 2024)

New Features

New model plan option - sample_from_entity_table (default: True): This new option for static predictive queries allows to customize the behavior of neighborhood sampling. If set to False, it will disallow sampling of other entities in the entity table besides the seed entity itself. Useful in case entities represent candidates/hypothetical examples in order to restrict information flow between different candidates.
Support for global baselines for all problems types, which now enables generating baselines for static link and node prediction

Improvements

Baseline is additionally supported for temporal multilabel classification and ranking problems and all static problems.
Baseline Triggering button is moved from model planner to a separate more visible button in the model planner page.
Speed up batch prediction table generation with a large number of entities and timeframes. For some queries this brings down BP table generation time from 1 hour+ to under 20 minutes.
Connector authentication for Snowflake Native app: When Kumo runs as a Snowflake native app, users no longer need to provide their credentials when creating a Snowflake connector; Kumo uses the built-in Oauth token in SPCS to connect to the customer’s warehouse. This change also ensures that all traffic between Kumo and Snowflake happens within Snowflake’s private network and Kumo no longer requires egress rules to connect to the customer’s Snowflake account. This change does require privileges to be granted to the Kumo native app before Kumo can access any data in the customer's Snowflake account.

Breaking Changes

None

v1.38 (July 15, 2024)

Baselines now supported in SPCS.
Encrypted keys now supported for Snowflake connector.
Backend performance enhancements.
Various minor fixes and UI improvements.

v1.37 (June 2, 2024)

Backend performance enhancements (SaaS)
Various minor fixes and UI improvements.

v1.36 (May 27, 2024) - extended release notes

Baselines page now displays a warning when a feature is not available.
Users are now alerted if multi-class classifications only have two classes.
Enhancements to in-app pQuery documentation and improved tooltips.
Various minor fixes and UI improvements.

v1.35 (May 13, 2024) - extended release notes

Kumo table and view creation now streamlined in a unified "Add Table/View" page.
Newly refined UI across the Kumo SaaS app.
Various minor fixes and UI improvements.

v1.34 (April 29, 2024) - extended release notes

Multi-label ranking is now available in PQLv2.
Encoder use can now be specified for autoregressive labels in regression and forecasting tasks (by specifying past_encoder in the model plan).
Various backend performance enhancements and improvements.
Various minor fixes and UI improvements.

v1.33 (April 11, 2024) - extended release notes

Enhanced monitoring for batch predictions to detect unusual gaps in fact tables.
For classification, link prediction, and regression tasks, heuristic baselines now available for comparing Kumo results to other baselines.
Various backend performance enhancements and improvements.
Various minor fixes and UI improvements.

v1.32 (March 25, 2024)

Data distribution drift statistics now available for batch predictions.
Row-level explainability (XAI) metrics now available via the explorer tab.
Enhanced datatype changes are now available during preprocessing when creating tables.
When setting up dimension tables, end date can now be set up to restrict training and batch predictions to a specific timeframe.
Various minor fixes and UI improvements.

v1.31 (March 11, 2024)

For ranking tasks (i.e., pqueries using LIST_DISTINCT with RANK TOP K), target item limit increased from 1M to 10M.
For certain types of pQueries (e.g., link prediction tasks), an Explorer section is available for evaluating predictions against historical and ground truth data.
Various minor fixes and UI improvements.

v1.30 (February 26, 2024)

Improvements for supporting extensive batch prediction jobs.
Various minor fixes and UI improvements.

v1.29 (February 15, 2024)

Improvements to AWS S3 connector allow for CSV/Parquet support and broader scaling (more tables) capability.
Various minor fixes and UI improvements.

v1.28 (February 1, 2024)

Various backend improvements to performance during training.
Various minor fixes and UI improvements.

v1.27 (January 15, 2024)

Additional features and syntax available for link prediction tasks.
MLOps monitoring dashboards available for batch prediction jobs.
Various minor fixes and UI improvements.

v1.26 (December 18, 2023)

The pquery syntax has been updated to make it easier to understand and more flexible in the way filters can be applied.
Various minor fixes and UI improvements.

v1.25 (November 27, 2023)

BigQuery now available as a batch prediction output.

v1.24 (November 13, 2023)

New model planner available during pQuery training allows for fine-grained control over encoders, training strategy, and the AutoML search space.
Additional model planner (previously advanced options) configuration options available

v1.23 (October 30, 2023)

XAI: various minor fixes and UI improvements.
XAI: metrics now available for multiclass and multilabel classification tasks
For node prediction tasks, test data splits can now be downloaded from the Review Evaluation Metrics page.
When selecting source tables, a new raw table option is available for connecting tables that don't conform to either fact or dimension table types.
Kumo views enable the running of traditional SQL queries that materialize a view in the Kumo data plane.

v1.22 (October 16, 2023)

Batch predictions now include output statistics computed from a sample of table data.
Various minor fixes and UI improvements.

v1.21 (October 2, 2023)

XAI - Cohort analysis for time columns now improved to be more interpretable.
XAI - Cohort analysis now working for tables that are two hops away from the prediction entity table.
A new refit feature enables automatic model refitting on entire data.
Descriptions can now be added and updated for any objects in the Kumo platform
During new pquery creation, automatically re-use already materialized graphs from prior pQuery creation jobs.
A new connector is available for connecting to Google Cloud BigQuery.
For multilabel classification pQueries (e.g. using the LIST_DISTINCT() operator on a maximum of 1,000 classes), evaluation metrics now include class-specific metrics.

v1.20 (September 18, 2023)

XAI - In Column Analysis, actual versus predicted values are now displayed per column.
A new table column type called Embedding enables the use of embeddings as an input column.
For regression pQueries predicting a numeric output (using COUNT, SUM, etc. operators), evaluation results now include scatter plot charts that display actual versus predicted values.
During pQuery training, charts and tables are now provided to show how the training example target labels used to train the pQuery vary over time and across training/validation/holdout data splits.

v1.19 (September 4, 2023)

A “Distribution of Predictions” chart showcasing a visualization of the predicted values alongside the actual target labels for all entities in a regression task (e.g., predictive queries with COUNT() or SUM() operator)
Expose boolean advanced option to handle prediction of unseen target entities at batch prediction time for link prediction tasks
Creating custom Kumo Views using SQL queries on top of tables already connected to the platform
Enable kicking off up to 10 asynchronous jobs (training/batch prediction) that will get queued and run sequentially one after another as older jobs complete
Enable concurrent execution of more than 1 job

v1.18 (August 21, 2023)

A plot showcasing the distribution of values for timestamp columns for validating while ingesting new tables
S3 CSV data sources supported as connectors
Calibrating batch predictions for classification tasks using Platt Scaling
Parallelize batch prediction jobs involving large dataset size on multiple workers (up to 4)
XAI - Explaining how the underlying data contributes to the final predictions
- Contribution score of individual tables and the columns within them
- Cohort analysis for the range of values of each column and for the range of number of historic facts available in tables
Miscellaneous minor UX flow, bug, predictive accuracy fixes