Skip to the main content.
Contact Us
Contact Us

5 min read

Getting Started with TabPy and Rserve in Tableau Prep

With the proliferation of data analysis tools out there, and the growing ability to use them in conjunction with one another, it’s not uncommon to see organizations adopt a variety of tools in their data analysis toolbox.  That makes this a great time to learn how we could use the widely popular programming language tools Python & R, in conjunction with Tableau Prep, the powerful data prep tool created by Tableau.

Tableau Prep allows users to clean, aggregate, merge and otherwise “prep” their data for analysis in Tableau. Although Tableau Prep is full of features, there will be times when you may require a python or R based approach to get what you need.

In this blog post, I’ll be providing a brief introduction on how to get started using Tableau Prep with TabPy and RServe. The objective of this post is to help facilitate the ideas and possibilities on how we can leverage data science capabilities by combining both tools. Specifically, we will be loading a dataset into Tab Prep, running a python/R script and having just the model details as our output.

Let's get to it!

Installing Tableau Prep, Python/R & TabPy/Rserve

Getting Tableau Prep

First screen Tableau Prep

Once download and installed, open Tableau prep. You should be in a screen which looks like this.

If you’re thinking about Tableau/Tableau Prep and how it can fit with you or your organization, a trial version of tableau is available for 14 days for you to try out. Similarly you can get a trial run of Tableau Prep Here.

Getting Anaconda and TabPy Started

This step can be skipped if you have Python and TabPy set up.

If you’re relatively familiar with programming, you’ve probably heard of conda! You can download anaconda here.

Once you are set up with Conda, open up your command prompt/terminal and type in

Installation directions for tabpy

After tabpy is installed, all you have to do is type ‘tabpy’ and you should get an instance running. It should look somewhat similar to:

Tabpy has been installed

Awesome! Now we’re ready to dive into our workflow.

Getting R and RServe Started

(This step can be skipped if you have R and Rserve set up or if you are using Python and TabPy.)

You can download R from here. If you prefer to use an IDE after installing R, I suggest downloading R Studio.  Once downloaded, you can open up your command prompt and type “R” to start your instance.  After starting R, you want to install the latest version of Rserve. This is version 1.8.7 at the current publishing of this blog post. 

***I recommend installing Rserve from RForge to get this version. The previous versions have errors notably the “Error: long vectors not supported yet: qap_encode.c:36 Fatal error: unable to initialize the JIT”. This is especially common on Macs. If you receive this error, please check your version of Rserve by typing “packageVersion("Rserve")”. If this version is not 1.8.7 proceed with the following

Correct version is running

Make sure you have 1.8.7 as the version of your Rserve

Once installed, typing library(Rserve) followed by an Rserve() command should start your Rserve instance.

Screenshot of starting Rserve instance.

A full transcript of the code can be found below:

Click to Expand the Install Code

 

Installing Tableau Prep, Python/R & TabPy/Rserve

Let’s begin by opening up Tableau Prep and loading our data!

Tableau Prep has a ton of options to connect to different data sources. However, for this tutorial we’re going to be using a flat file of a Kaggle dataset, which can be found here.

After downloading the CSV file, connect to it by clicking the ‘+’ button, navigating to ‘Text File’ and then choosing the relevant file to bring it into prep.                               

Now that we have our data loaded, let’s connect to our TabPy instance. On the top of your screen browse to ‘Help > Settings and Performance > Manage Analytics Connection’. After that you want to make sure your Analytics extension is set to TabPy, with your server being ‘local host’ and your port set to 9004. Note that if you’re running an Rserve instance, you want to connect to port 6311.

Screenshot of dashboard connected to TabPy

Awesome! Now we’re ready to build a script.

 

Building a Script in Python or R

For this demo, I want to build a script that is applicable to common data science practices. This includes preprocessing some of our data and building simple linear models. I want my script to simply output a table of evaluation metrics from the models I build after preprocessing the data.

Python Script:

  1. Open an empty script in your favorite IDE. I’ll be using Visual Studio Code.  I kept my instance of TabPy running in my terminal iDE Hyper.
  2.   As my goal of this script is to have the output of my model metrics, I wanted to build a temporary empty table that could hold those relevant details. We’ll name this table ‘evaluation’. In addition, I will import all libraries we may need to preprocess and build models with our data.

View Code for Imports & Table

  1. We are going to have a two-part function – one that preprocesses our data, and the other that builds the model. It’s important that when we create our script we include a function that specifies a pandas data table (pd.DataFrame) as an argument of the function. This will call your data from Tableau Prep Builder. We will also need to return the results as a table (pd.DataFrame) using supported data types. Our main function is called ‘encode’.
  2. The preprocess function takes the argument “dataset” and then performs necessary processing tasks. For this demo, I’ve removed rows with missing values present for our dependent variable: price. I’ve also log transformed one variable, “sqft_above” and saved it as a new variable “log_sqft_above”. I’ve then constructed a pandas dataframe, picking the variables I want to work with, while label encoding them:
  3.   Building Simple Linear Models: The function “model” takes in “dataset” as an argument as well. However as this function resides inside of our wrapper function, the dataset is the output from the preprocess function! This chunk of code performs typical and simple data science tasks such as splitting the dataset into train and test sets, running a linear regression and predicting the house price on the test set. It then stores the evaluation metrics in the “evaluation” table we created in the previous steps.

View Code for Functions

  1. The get_output_schema_function: If we want to return different fields than what we have used as inputs, we need to define a get_output_schema function in our script. This script should define our outputs along with supported variable types.

For our flow, we simply defined outputs to the evaluation table we created previously.

View the Output code

And that wraps our script. Let’s give this script a name and save it.

R Script:

Our R script will be following the same fundamentals. The encode function does all the required steps which includes preprocessing and running a simple linear regression. We then call our metrics and save it accordingly to a dataframe. The dataframe is then called at the end using the getOutputSchema function.

View the Output code

With that, we now have an R script that does a similar task to our python script!

 

Running the Script in Tableau Prep

To load our script in Tableau, all we have to do is click is the ‘+’ button beside our dataset in Tableau Prep and select ‘Script’.  For our settings tab for our script step, select the instance(TabPy or Rserve). Browse to the respective script we created and select that for our file name. 

Finally, we declare the function name which would be our 'encode’ function. We can then press the “+” button again and then add the output step. 

Running this flow will output our  table from which we can observe the linear regression model that was built using our script. This table can now be downloaded as a csv, or written to tables directly in our database tables.

Gif showing running flow.

And with that, we now have an output that spits out our model name, the details of the model, an RMSE metric and an R squared value! This is very useful if you have multiple models that you want to switch through in your dashboard using a parameter. Tableau Prep is an efficient method to combine multiple sources and shape data in a variety of ways. With the ability to add scripts through Rserve and TabPy integration, we now have a capability to utilize whole new worlds of advanced functionality that Tableau Prep does not currently have.

 


Explore More

4 min read

8 Reasons Managed Data Analytics Services Enhance Internal Processes

Every business leader’s dream is to reach broader markets, boost conversion rates, and level up their brand reputations. By now, most decision-makers...

Read More

4 min read

Building Your Own Data Team vs. Managed Data Analytics

By now, almost every business leader recognizes the vital importance of becoming more data-driven. Despite this, experts estimate that over 80% of...

Read More

3 min read

Augmenting Teams with Automated Decision-Making

Modern data analytics platforms make extensive use of machine learning models, and they’ve transformed the way we work by taking much of the legwork...

Read More