With the proliferation of data analysis tools out there, and the growing ability to use them in conjunction with one another, it’s not uncommon to see organizations adopt a variety of tools in their data analysis toolbox. That makes this a great time to learn how we could use the widely popular programming language tools Python & R, in conjunction with Tableau Prep, the powerful data prep tool created by Tableau.
Tableau Prep allows users to clean, aggregate, merge and otherwise “prep” their data for analysis in Tableau. Although Tableau Prep is full of features, there will be times when you may require a python or R based approach to get what you need.
In this blog post, I’ll be providing a brief introduction on how to get started using Tableau Prep with TabPy and RServe. The objective of this post is to help facilitate the ideas and possibilities on how we can leverage data science capabilities by combining both tools. Specifically, we will be loading a dataset into Tab Prep, running a python/R script and having just the model details as our output.
Let's get to it!
Once download and installed, open Tableau prep. You should be in a screen which looks like this.
If you’re thinking about Tableau/Tableau Prep and how it can fit with you or your organization, a trial version of tableau is available for 14 days for you to try out. Similarly you can get a trial run of Tableau Prep Here.
This step can be skipped if you have Python and TabPy set up.
If you’re relatively familiar with programming, you’ve probably heard of conda! You can download anaconda here.
Once you are set up with Conda, open up your command prompt/terminal and type in
After tabpy is installed, all you have to do is type ‘tabpy’ and you should get an instance running. It should look somewhat similar to:
Awesome! Now we’re ready to dive into our workflow.
(This step can be skipped if you have R and Rserve set up or if you are using Python and TabPy.)
You can download R from here. If you prefer to use an IDE after installing R, I suggest downloading R Studio. Once downloaded, you can open up your command prompt and type “R” to start your instance. After starting R, you want to install the latest version of Rserve. This is version 1.8.7 at the current publishing of this blog post.
***I recommend installing Rserve from RForge to get this version. The previous versions have errors notably the “Error: long vectors not supported yet: qap_encode.c:36 Fatal error: unable to initialize the JIT”. This is especially common on Macs. If you receive this error, please check your version of Rserve by typing “packageVersion("Rserve")”. If this version is not 1.8.7 proceed with the following
Make sure you have 1.8.7 as the version of your Rserve
Once installed, typing library(Rserve) followed by an Rserve() command should start your Rserve instance.
A full transcript of the code can be found below:
Click to Expand the Install Code
Let’s begin by opening up Tableau Prep and loading our data!
Tableau Prep has a ton of options to connect to different data sources. However, for this tutorial we’re going to be using a flat file of a Kaggle dataset, which can be found here.
After downloading the CSV file, connect to it by clicking the ‘+’ button, navigating to ‘Text File’ and then choosing the relevant file to bring it into prep.
Now that we have our data loaded, let’s connect to our TabPy instance. On the top of your screen browse to ‘Help > Settings and Performance > Manage Analytics Connection’. After that you want to make sure your Analytics extension is set to TabPy, with your server being ‘local host’ and your port set to 9004. Note that if you’re running an Rserve instance, you want to connect to port 6311.
Awesome! Now we’re ready to build a script.
For this demo, I want to build a script that is applicable to common data science practices. This includes preprocessing some of our data and building simple linear models. I want my script to simply output a table of evaluation metrics from the models I build after preprocessing the data.
Python Script:
View Code for Imports & Table
View Code for Functions
For our flow, we simply defined outputs to the evaluation table we created previously.
View the Output code
And that wraps our script. Let’s give this script a name and save it.
R Script:
Our R script will be following the same fundamentals. The encode function does all the required steps which includes preprocessing and running a simple linear regression. We then call our metrics and save it accordingly to a dataframe. The dataframe is then called at the end using the getOutputSchema function.
View the Output code
With that, we now have an R script that does a similar task to our python script!
To load our script in Tableau, all we have to do is click is the ‘+’ button beside our dataset in Tableau Prep and select ‘Script’. For our settings tab for our script step, select the instance(TabPy or Rserve). Browse to the respective script we created and select that for our file name.
Finally, we declare the function name which would be our 'encode’ function. We can then press the “+” button again and then add the output step.
Running this flow will output our table from which we can observe the linear regression model that was built using our script. This table can now be downloaded as a csv, or written to tables directly in our database tables.
And with that, we now have an output that spits out our model name, the details of the model, an RMSE metric and an R squared value! This is very useful if you have multiple models that you want to switch through in your dashboard using a parameter. Tableau Prep is an efficient method to combine multiple sources and shape data in a variety of ways. With the ability to add scripts through Rserve and TabPy integration, we now have a capability to utilize whole new worlds of advanced functionality that Tableau Prep does not currently have.