4 min read
8 Reasons Managed Data Analytics Services Enhance Internal Processes
Every business leader’s dream is to reach broader markets, boost conversion rates, and level up their brand reputations. By now, most decision-makers...
3 min read
Michael Thompson
:
Jul 22, 2020
The fundamental first step to any data project is understanding the data model and the relationships between data points (often referred to as “Exploratory Data Analysis” or EDA for short). A scatterplot matrix is a great tool to get a quick understanding of the numeric relationships and inform next actions in achieving your project goal. While often done in Python or R, I’ll show you why I think Tableau can help you get your EDA done in a fraction of the time.
Tableau is an often overlooked tool for exploratory data analysis (EDA), which is commonly performed in Python or R. It is easy to understand why: in many projects, you are already using those tools, they are free, and they have wide community support when you have questions. However, creating a scatterplot matrix in these tools can be cumbersome and time consuming. For most who do not use Python or R daily, many tasks require significant Google searching to find a method to visualize your unique data in the way you want. In my case it took me over 30 minutes in each language to develop an equivalent scatterplot to what you can create in Tableau in under a minute (see below).
Tableau can be a very useful tool for data scientists, data analysts, or really any professional interested in understanding their data to make informed business decisions. Using Tableau for EDA of any kind, especially scatterplot matrices, can be faster, provides a way to interact with key statistical information, and connects easily to most data sources. This frees up more of your time to understand the relationships themselves and get to your more important tasks.
The following sections will cover a comprehensive walk through for creating the above visual in Tableau. For those familiar with Tableau, feel free to skip “Step 1: Setup”. There is also the code for creating scatterplots similar to what is shown above in both Python and R at the bottom of this blog.
In the example below, the goal is to establish what relationships exist between each flower species on different numeric measurements of a flower’s anatomy. Feel free to follow along with the example below or what ever current data set you are interested in working with.
All visuals are using the standard “iris” data set found standard in both Python and R and can be found from the University of California Irvine (which you can download here). Click on the link, right click anywhere on the web-page, and then click “Save as”.
First, connect to your data file:
Congratulations! By this point, you have a fully functional scatterplot which is interactive and has general statistical information including an R-squared value for each relationship. However, there are some simple touches which will ease the viewing experience for your user and enrich the insights gained from this scatterplot matrix.
This step is not always necessary, but by introducing information such as “Species” in the iris data set you can get a better understanding of possible clusters within groups.
Well done! You have reached the goal of revealing the relationships between different flower species among varying numerical measures. You can see that there are generally positive relationships in anatomical sizes, and that the flower species (setosa) in blue is sufficiently different than the other two species!
By using Tableau to do your exploratory data analysis (EDA), you will free up more time to focus on other important tasks, and stand out as a highly efficient team member.
Below are examples of the necessary code and output to replicate what was done in Tableau in both Python and R. For me, it took less than one minute to create this scatterplot matrix in Tableau, whereas it took over 30 minutes in Python or R to code the equivalent output (which might include a few Google searches to refresh my memory on syntax).
Files
4 min read
Every business leader’s dream is to reach broader markets, boost conversion rates, and level up their brand reputations. By now, most decision-makers...
4 min read
By now, almost every business leader recognizes the vital importance of becoming more data-driven. Despite this, experts estimate that over 80% of...
3 min read
Modern data analytics platforms make extensive use of machine learning models, and they’ve transformed the way we work by taking much of the legwork...