Skip to the main content.
Contact Us
Contact Us

3 min read

Faster EDA with Tableau: A Scatterplot Matrix to Make Python/R Jealous

The fundamental first step to any data project is understanding the data model and the relationships between data points (often referred to as “Exploratory Data Analysis” or EDA for short). A scatterplot matrix is a great tool to get a quick understanding of the numeric relationships and inform next actions in achieving your project goal.  While often done in Python or R, I’ll show you why I think Tableau can help you get your EDA done in a fraction of the time.

Why Tableau?

Tableau is an often overlooked tool for exploratory data analysis (EDA), which is commonly performed in Python or R. It is easy to understand why: in many projects, you are already using those tools, they are free, and they have wide community support when you have questions. However, creating a scatterplot matrix in these tools can be cumbersome and time consuming. For most who do not use Python or R daily, many tasks require significant Google searching to find a method to visualize your unique data in the way you want. In my case it took me over 30 minutes in each language to develop an equivalent scatterplot to what you can create in Tableau in under a minute (see below).

Tableau can be a very useful tool for data scientists, data analysts, or really any professional interested in understanding their data to make informed business decisions. Using Tableau for EDA of any kind, especially scatterplot matrices, can be faster, provides a way to interact with key statistical information, and connects easily to most data sources. This frees up more of your time to understand the relationships themselves and get to your more important tasks.

The following sections will cover a comprehensive walk through for creating the above visual in Tableau. For those familiar with Tableau, feel free to skip “Step 1: Setup”.  There is also the code for creating scatterplots similar to what is shown above in both Python and R at the bottom of this blog.

Scatterplot Matrix Walkthrough Using the Iris Data Set

In the example below, the goal is to establish what relationships exist between each flower species on different numeric measurements of a flower’s anatomy. Feel free to follow along with the example below or what ever current data set you are interested in working with.

All visuals are using the standard “iris” data set found standard in both Python and R and can be found from the University of California Irvine (which you can download here). Click on the link, right click anywhere on the web-page, and then click “Save as”.

Gif showing how to save iris data set

Step 1: Setup

First, connect to your data file:

  • Open a new Tableau Workbook.
  • In the “Connect” panel on the left, click on “Text file” (a CSV is a special type of text file where columns are separated by commas).
  • Find where you stored the “iris.csv” file (or whichever data file you’ll be using).
  • Click on the file then click “Open”.

Step 2: Create a Quick Scatterplot Matrix

Gif to create scatterplot matrix

  • Click into a blank “Sheet”.  You can use the icon just to the right of “Sheet 1” in the image to the right to create a new sheet if needed.
  • The “Data” pane on the left contains dimensions (blue) and measures (green).
  • . Tableau will assume the datatype for each field. Typically, measures are assumed from numerical fields (integers, floats, etc.) and Tableau does a good job of this if the data is clean.
  • Select all measures you wish to include in the scatterplot by holding CTRL and clicking all individually, or by holding “SHIFT” and then clicking the top and bottom measures of the range you’re interested in.
  • Drag the group of measures to the “Rows” header
  • Double click the measure in “Data” pane which appears first or furthest left in the “Rows” header. Do this TWICE.
  • Remove the redundant green measure pill in the “Rows” and “Columns” pane by hitting “Delete” on your keyboard or pulling the green ‘pill’ off your row or column shelf.  When done correctly, the first green pill should be a duplicate and can be removed easily when you are referencing many measures.
  • Click on the “Analysis” dropdown in the top menu bar.
  • Uncheck “Aggregate Measures”.
  • Click on the “Analytics” tab in the “Data” pane.
  • Double click on “Trend Line”.

Congratulations!  By this point, you have a fully functional scatterplot which is interactive and has general statistical information including an R-squared value for each relationship. However, there are some simple touches which will ease the viewing experience for your user and enrich the insights gained from this scatterplot matrix.

Step 3: Format the Quick Scatterplot Matrix

Gif of scatterplot matrix

  • In the “Marks” pane click on the “All” tab.
  • Adjust data point size by clicking “Size” in the “Marks” pane and adjust sliders until desired size is reached.
  • Adjust the point shape by clicking “Shape” in the “Marks” pane and adjust the shape to a solid circle.  Use a reduced opacity to reveal overlapping points if needed.

Step 4: Add a Dimension to add context

This step is not always necessary, but by introducing information such as “Species” in the iris data set you can get a better understanding of possible clusters within groups.

  • Drag any applicable dimension from the “Data” pane to the “Color” tab in the “Marks” pane.

 

Closing: All the Benefits in 1/10th the Time

Well done!  You have reached the goal of revealing the relationships between different flower species among varying numerical measures. You can see that there are generally positive relationships in anatomical sizes, and that the flower species (setosa) in blue is sufficiently different than the other two species!

By using Tableau to do your exploratory data analysis (EDA), you will free up more time to focus on other important tasks, and stand out as a highly efficient team member.

Below are examples of the necessary code and output to replicate what was done in Tableau in both Python and R. For me, it took less than one minute to create this scatterplot matrix in Tableau, whereas it took over 30 minutes in Python or R to code the equivalent output (which might include a few Google searches to refresh my memory on syntax).

Files

Click for R Code

Click for Python Code

 


Explore More

4 min read

8 Reasons Managed Data Analytics Services Enhance Internal Processes

Every business leader’s dream is to reach broader markets, boost conversion rates, and level up their brand reputations. By now, most decision-makers...

Read More

4 min read

Building Your Own Data Team vs. Managed Data Analytics

By now, almost every business leader recognizes the vital importance of becoming more data-driven. Despite this, experts estimate that over 80% of...

Read More

3 min read

Augmenting Teams with Automated Decision-Making

Modern data analytics platforms make extensive use of machine learning models, and they’ve transformed the way we work by taking much of the legwork...

Read More