Table Linkages
What Tables Should I Link?
In order to choose which tables to link together, you should think about the main entities in your enterprise and how they relate to/interact with each other, the same way you would design the data model for your data warehouse/BI tooling. To connect table(s) into a graph, at least one table should contain a primary key.
For example, in an e-commerce graph, you may have:
- Many Shoppers, each of which may have one or more sessions
- Within those sessions would be one or more web interaction events such as Clicks, Add-to-Carts, Purchases, etc. usually done in relation to products
- Products could be sold by different Merchants
In this case, you would link together tables for representing:
- Shoppers (PK = Shopper ID)
- Products (PK = Product ID)
- Merchants (PK = Merchant ID)
- Sessions (Time Column = Session End Time and an ID Column for Shopper)
- Either a single events table or separate tables for clicks, add-to-carts, and purchases (each with Time Column = Time of Event, and where there are ID columns for the relevant shoppers, products, and merchants)
- In the case of frequently changing data for shoppers/products/merchants, you can also include tables for them where each fact represents a snapshot of the changing data at a particular point in time, with appropriate ID columns
Your organization's graph may vary significantly from this example; Kumo is flexible enough to accomodate virtually any enterprise graph.
Lastly, how comprehensive you need to make this graph depends on the predictions you want to make downstream. For example, if one of your predictions is the "30-day sum of future purchase values for each customer":
-
At a minimum, you would need to include two tables:
- A table with a row per unique customer, with at least a customer ID PK column.
- A table with a row per historical purchase event, with at least a time column for the time of the purchase, a customer ID FK column linking it to the customer dimension table, and a column for the numeric value of each purchase.
You may eventually also want to include information relevant to predicting future purchases, such as fact tables for web engagement data, or an additional dimension table for product catalog metadata. To start, you can include the minimum set of tables and slowly add more over time, without changing your predictive query definitions
Updated 6 months ago