View Source Type
Use traditional SQL on top of Kumo tables for building powerful pqueries.
In many cases, you will need to perform more fundamental data preparation within Kumo—for example, you may require columns consisting of custom aggregations that are difficult to create in upstream ETL pipelines. This can be accomplished by using a Kumo View.
Kumo views allow you to run traditional, custom SQL queries that join multiple tables and materialize a view within the Kumo data plane, which can then be used as regular Kumo tables for linking your graphs and building your pQueries.
Views inherit the original column names and data types from the underlying source tables, eliminating the need for additional steps in the table flow to set data types.
Note: Kumo Views are based on Spark SQL, Spark's SQL implementation for manipulating distributed SQL queries.
Creating a Kumo View
Clicking on the Connect Table button will bring up the "Add Table/View modal window. If you select "View" as your source type, Kumo will automatically load the "Connector" drop-down menu with a list of available connectors, a list of available tables for reference (based on your selected connector), as well as a text area for writing your view's SQL statement.
Under "available tables," you can click the See Details button to view the respective tables' columns and their datatypes. Click on the Back button to return to "Available Tables.
You can write the SQL statement for your view in the "Spark SQL Query" text area and click on the Validate button to ensure it's working as expected.
If your SQL Query is valid and working as expected, you should see "Preprocessing Settings" appear at the bottom of the modal window. Default column types are inherited from the tables used to create the view or from Spark SQL query execution for newly defined columns—however, you can verify and change the column types here, if needed.
For guidance regarding column selection and preprocessing, please see the following pages:
Select the columns you want as part of the view and click the Save button to continue.
Kumo view types and Kumo table types. share the same definitions and nuances.
Once your view has been successfully created, you will be routed to your new view's detail page where you can view/verify the table creation results and column statistics (the latter may take a moment to load).
Your newly materialized view can now be used as a regular Kumo table in your ML pipeline.
Usage Considerations
The following are some crucial considerations to keep in mind when using Kumo views:
- Views can only be defined using valid SQL queries on top of Kumo tables. You cannot create views on top of other views.
- The SQL query defining a view must include tables coming from the same connector. You cannot create views that join tables from different connectors.
- An alias must be provided for new column created through aggregations or other operations on existing columns.
Please refer to the Official Spark 3.3.0 Documentation for SQL reference and guidance.
Updated 6 months ago