Tableau continues to add awesome new features with every new release. Tableau 2020.2 now includes mobile optimized Metrics, dynamic set control, spatial data updates, and more! One of the biggest new features is the release of the enhanced data model, a whole new way to define relationships between data tables.
The traditional method to merge data from multiple tables in Tableau requires you to define the join type and input the field from each table that matches.
Now, instead of the join connection, a curved line (“the Noodle”) will appear. This line identifies a relationship between the tables. This allows the tables to be blended even if they have different levels of detail. You still have to identify the matching fields in the data sets, but you do not need to specify the join types. You are able to identify more than one matching field, just as before.
There are additional performance options, such as adjusting the cardinality or the referential integrity (matching records) between the two tables. In most cases, the default options will give you the results you are looking for.
Now I know this seems too good to be true! Won’t combining tables at different levels of detail create duplicate data or cause some data to be excluded? How will our calculations aggregate correctly?
With the new data model, Tableau will not define the joins right away. Instead, it will execute the join as the view is being built so as to only utilize the tables required. Tableau will automatically adjust the join as you add dimensions and measures to the sheet in order to aggregate calculations to the correct level of detail.
This is something that previously would have required using level of detail calculations to ensure accuracy. The result of the flexible relationships is a custom data source for every viz in a workbook, without the additional work of building out each data model.
To demonstrate how easy this is, I’m taking the sample Bookstore.xlsx data set from Tableau. This data set has quite a few different tables. Below is an example of a data model using both the Noodle and also left joins, where we have Books related to Authors and Ratings. This is a one to many relationship, as there are many Ratings for each book.
Data Model with the Noodle
Data Model with Left Joins
Let’s try a simple analysis to compare the total published books and average ratings for each Author. If we compare the same analysis using the joined tables and then the related tables, we can see that we get very different results.
Analysis with the Noodle
Analysis with Left Joins
It’s pretty obvious that we are getting the incorrect total of books with the joined tables. Since we added in the Ratings table, each row of books has been duplicated multiple times. To fix this, we would have to change the count to a count distinct. In the related tables, the number of boxes was appropriately aggregated automatically.
The average rating by Author is the exact same in both versions. This is because we are taking the average, which results in the same rating regardless of the extra rows with the joined tables.
You cannot have relationships based on calculated fields or geographic fields, that will require you to use the traditional join methods.
Published data sources can not be related together and relationships can not be edited in published data sources.
If your viz only includes dimensions, you may be missing anything that is unmatched between related tables. To fix this, turn on Show Empty Rows/Columns. Tableau recommends checking the underlying data frequently so you can observe how the data model is changing based on the viz. You can also view the underlying query by using the Performance Recording and selecting the event “Executing Query”.
If you are skeptical of this new method of combining tables, the traditional join options are still available by double clicking on a table. And for any existing workbooks, the data models will be maintained and can still be edited.
They will now show as a single table called “Migrated Data”.
Personally, I’m interested to see the performance improvements since this new data model only queries the tables that are used in each viz. I have also found that the limitations that come with Blending data sources (left joins only 😕) often deter me from using that Tableau feature. I think this is a pretty exciting development for Tableau and I can’t wait to try this out with some of my existing data models!
Want to try this new feature out for yourself? Upgrade to 2020.2 and follow this step by step tutorial to see Relationships in action!