I like to use scaffolds to solve questions that have sparse datasets, have voids, or have independent dates and need a common calendar. But that always results in exploding the dataset with repetitive records – that is until the introduction of “Shared Dimensions”.
Recently I posted a solution to calculating ending inventory using shared dimensions
https://jimdehner.com/2024/10/17/shared-dimension-inventory-analysis/ .
I used a Cartesian (1:1) join to connect the sales and production data sets. Then I got a call from Kirk Munroe – if you work with Shared Dimensions, I highly recommend you follow his work – and he showed me a better way. I am a convert!!!
1 – The Inventory Analysis Problem –
In its simplest form, the month-end inventory is just the inventory at the start of the month, plus any production (or other additions) minus what is sold. In the real world, not all products are produced or sold each month, new products are introduced or replace established products as needed, making it a difficult problem to solve.
There are three data sets – Each uses the same products but has different independent dates.
The Sales History looks like this:
Production Schedule:
And the inventory at introduction:
We will need two more “master” data sets to complete the connections –
The Product Master –
And a date ladder for the scaffold – yes, we still need a common set of dates to connect the 3 independent dates.
Now the connections. First, Sales and Production data sets are connected to the Scaffold and Product Master.
The Sales and Production datasets contain the Measures (sales quantity and production quantity) that are used in the analysis. They are referred to as the “Base Tables”. The Date Ladder and Product Master contain the Dimensions that are needed and are often referred to as "Fact Tables".
We need to connect each base table to the individual fact tables – Four connections total.
First Sales to the Product Master on the name dimensions
(NOTE using different names for the Dimension Filed is recommended to avoid eliminating data in the join)
Next Production to the Product Master
Now connect the Sales to Date Scaffold on their date dimensions.
And the Production to the Date Ladder
Let’s make that viz!
The key is to use the Dimensions from the Date Ladder or Product Master in the rows and columns for the viz and the Measures from the Sales History and Production on the Detail (marks cards)
Once we have measures in the table, they can be used in any form of calculation like calculating net production:
zn(sum([Production Qty]))-zn(sum([Quantity]))
As noted, not all products are produced or sold each month so you will want to use the ZN() functions: and we can see the results at the monthly level:
And the values total properly as the date hierarchy is collapsed:
So far, we are good –
Now we want to add a beginning inventory for each product so we can use it to determine the available product and ending inventory each month.
Add in the Inventory base table and join to the Date Ladder and Product Master:
The connection to the Product Master is.
And to the Ladder is
Now add the starting inventory to the viz
- The Dimension – Ladder date is used on columns.
- From the Product Master the Dimension Product Name is placed on Rows
- The Measure – Initial Inventory units come from the Beginning Inventory table.
- Product units (Measure) come from the Production Schedule Table
- Sales units are a Measure from the Sales History table.
The total available product to cover sales is the total of the initial inventory and production.
zn(sum([Initial Inventory Units])) + zn(sum([Production Qty]))
The ending inventory for each month can be determined using this running calculation.
RUNNING_SUM([Product Available BI + Production]) – running_sum(zn(sum([Sales Quantity])))
Or as a bar chart with color added.
And that is how it is done.
This same process can be done to combine any number of tables that have different independent dates but how about two independent dates within the same table?
Does the process still work?
2 The 2-Date Problem
A frequent problem in the real world arises when you know the open and close dates for an event, financial obligations, or like this example, a number of open projects and want to aggregate overall dates in the range.
Here is the dataset:
Each project has an open and close date, and we need to look across all dates and count how many projects are open or closed.
We will still need to scaffold the data – here it is just sequential dates to form a date ladder.
I also added some project spending – so we can see spending my month – it adds a 3rd date to the problem.
And as we saw in the previous example, I have a Project Master dataset.
Just a list of the Product IDs.
We will need to connect the datasets with the Measures needed – Project Date Ranges and Project Spending to the data sets with the Dimensions needed – Project ID and Date Scaffold – a total of 4 connections.
Here is the first to the Project ID’s.
And the Project IDs to Spending
Connection to the data scaffold is:
Now to connect the project date range to dates we need to include only the range of dates for each project. That is done with 2 conditions using inequalities – we need scaffold dates less than the project close dates but greater than the project open dates.
That’s it – now create the viz.
Place the date Dimension from the Date Scaffold on columns and the Project ID from the Project Master table on rows.
For Measure, use a Countd on Projects from the Project Range Table.
I added some total to get the count by month.
And collapsing the date to quarters aggregates properly.
Spending is:
Great – still need a scaffold but no Cartesian join – But what happens when some of the projects do not have a close date – i.e. the close date is null.
The process will still work but we need to provide a mock close date for the connection between the Project Date Range and the Scaffold.
The project date range table has been modified to look like this.
Projects 2, 8, and 9 are still open – no close date in the data –
I wanted to force a close date into the join using the ifnull() function. I arbitrarily assigned a 3-year planning horizon from today() as the close date and the clause looks like this:
Your company practice may be different, and you can code the date accordingly.
That is the only change needed for the model – projects 2, 8, and 9 now are open across the entire date range in the viz.
Am I ready to say I will never use a Cartesian join again? Never is a long time – we’ll see – but for now, I will be using the processes here and encourage you to do the same.
The workbook used here can be downloaded at Link to Tableau Workbook.
Jim
2 Responses
Terrific post and examples Jim! Will be very useful going forward….
That one went straight to my Template collection