4 min read

Regression, RServe & Tableau/Prep : Visualizing Model Performance

Regression, RServe & Tableau/Prep : Visualizing Model Performance

So you acquired all this of this awesome data and you’re asked to build a model to predict an outcome. You love Tableau, so you decide you want to use it to visualize your data and your model performance. However, you also want all the data wrangling and model building you’ll be doing to be a nice, repeatable, and easily changeable process. How can you combine all of these things to make your life easier?

In this blog post, I’ll be diving into an example of how this can be accomplished. First, I’ll show how we can build a regression model in R, integrate that into a Tableau Prep workflow using RServe, and finally, visualize a few metrics using Tableau Desktop.

If you aren’t familiar with setting up R, RServe or Tableau Prep, check out my previous blog post.

 

Building the RServe Script for Regression

We’re going to be using the Kings County Housing dataset, which you can download here.

For this script we want to be able to run a regression model on all of our data. Our script should run a model, and output row-wise level fitted values from our regression. We also want to output row level errors (the difference from our predicted and actual values) and a couple of other things to visualize metrics, such as R squared, adjusted R-Squared, MSE, RMSE, MAE, and MAPE.

Our objective of the script is to output a dataframe consisting of metrics for each row present in our dataset. Our workflow will then join this script to our original dataset and bring it together for us to use in Tableau Desktop.

RServe Script

1. Open an empty script in your favorite IDE. I’ll be building the script using R Studio.

2. Let’s start by defining what libraries we’d like to use. For this demo, I’ll only load two: tidyr and dplyr. Once you’ve installed these libraries (using the packages feature of RStudio, or via the “Install.packages” function), we can begin to build our encode function.

3. Our encode function houses most of the meat of the code. We’ll begin by removing any rows with blank entries that exist in this dataset. We’ll then take our dependent variable price, along with a few variables of interest, and an identifier column ID. We will call this new reduced dataset “dataset_reduced”. We then set our preferred data types within the script for the variables we selected.

4. We can now move on to building a matrix. Our matrix “data_model” transforms any variable that we set as factors into indicator variables. The model.matrix function does this for us. We will then convert the matrix back to a dataframe for ease of use later on.

5. The use of the model.matrix function creates another identifier variable. We don’t actually require an identifier variable for our regression, so we’ll drop both the ID and the identifier variable.

6. Alright! Let’s run our regression. We’ll be using price as our independent variable, and running it against everything else in the matrix. After we run our regression, we can grab the values we need to start building out calculations that demonstrate how well our model performs.

7. The following code calculates the error (original values - predicted values) at the row level. We then calculate the Residual Sum of Squares (RSS), which is simply our error squared. We then take the mean value of our dependent variable subtracted from each of our predicted values. We square this value to get our Explained Sum of Squares (ESS). Finally, we calculate Total Sum of Squares(TSS), which is simply the addition of RSS & ESS.

8. Our final dataset will contain our ID, fitted values, RSS, ESS & TSS. This is defined by our getOutputScheme function.

 

Building a Quick Tableau Prep Workflow

We can now build a flow which takes our dataset, integrates a script to output row-level calculations that we had built, and join it back to the original dataset. We can then output this file into a table or other data sources. We can also set our flows’ to run on a schedule, eliminating manual refreshes and allowing our tables to populate without interference. This in turn will automatically keep our dashboard up to date. For our demo, we’ll be outputting this file back into a csv that we’ll then open in Tableau Desktop.

  1. After opening Tableau Prep, navigate to the top of the screen. We want to navigate to Help > Settings & Performance > Manage Analytics Extension Connection. Change the default to Rserve, with server name being localhost and port 6311.

  2. We’re now ready to start our workflow: Load in our data by clicking the ‘+’ button, then clicking the “txt file” option and locating your newly created file.

  3. After the data appears in our workflow, we want to start by adding a script (for help here, see my blog post on using Rserve with Tableau Prep) . Once the script is added, make sure you have the script settings to run the function you defined in the script.

  4. We also want to add a join clause to the original dataset. On the join clause, make sure you’re joining the ID column from the results of our script to the ID column from the dataset. This should be an inner join.

 
Join Clause for ID columns
 

5. We can finally add an output statement and modify it to either output our results into a csv file or even write it to a table in our database. For this example, we’re going to output this into a csv file.

 

Visualizing Model Performance in Tableau Desktop

We can now open our output file inside Tableau Desktop. Connect to the file, and navigate to a new sheet. We’ll start by building a couple of metrics off of the columns we have from our workflow.

For the purpose of this demo, we will be building a calculation for R Squared, Adjusted R Squared and Root Mean Squared Error.

Because of the way Tableau calculates these functions, they can be used in conjunction with any filters you might add, allowing you to create a sliceable, interactive dashboard for your model users that can react their individual use cases! I’ve prebuilt a dashboard with some of these metrics calculated for demonstration that you can check out below. The map shows us the different zip codes, and clicking on a specific zip code filters the model performance metrics. Similarly filtering a grade updates the dashboard as well:

 
gif of final dashboard
 

We now have a dashboard that visualizes model performance! Similar to what we have done above, we could also calculate over/under predictions, MAE, MAPE etc.

Our script could also be updated to output performance metrics based on multiple models and/or different train/test splits.

With Tableau’s flexibility, and the right data setup, almost anything is possible!

  

Explore More

What is Tableau+? | Is Tableau Plus Worth It?

What is Tableau+? | Is Tableau Plus Worth It?

Tableau Plus is the new premium offering from Tableau, a leading data visualization and business intelligence platform. It builds upon the...

Read More
Cannot Mix Aggregate and Non-Aggregate | Tableau Error Message

Cannot Mix Aggregate and Non-Aggregate | Tableau Error Message

If you've spent any time working with Tableau, you've likely encountered the dreaded "Cannot Mix Aggregate and Non-Aggregate Arguments" error. It's a...

Read More
Blue vs. Green Fields in Tableau | Discrete vs. Continuous

Blue vs. Green Fields in Tableau | Discrete vs. Continuous

What's with the blue and green pills in Tableau? This question frequently arises during the first month of learning Tableau, and truly understanding...

Read More