4 min read

Regression, RServe & Tableau/Prep : Visualizing Model Performance

Regression, RServe & Tableau/Prep : Visualizing Model Performance

So you acquired all this of this awesome data and you’re asked to build a model to predict an outcome. You love Tableau, so you decide you want to use it to visualize your data and your model performance. However, you also want all the data wrangling and model building you’ll be doing to be a nice, repeatable, and easily changeable process. How can you combine all of these things to make your life easier?

In this blog post, I’ll be diving into an example of how this can be accomplished. First, I’ll show how we can build a regression model in R, integrate that into a Tableau Prep workflow using RServe, and finally, visualize a few metrics using Tableau Desktop.

If you aren’t familiar with setting up R, RServe or Tableau Prep, check out my previous blog post.

 

Building the RServe Script for Regression

We’re going to be using the Kings County Housing dataset, which you can download here.

For this script we want to be able to run a regression model on all of our data. Our script should run a model, and output row-wise level fitted values from our regression. We also want to output row level errors (the difference from our predicted and actual values) and a couple of other things to visualize metrics, such as R squared, adjusted R-Squared, MSE, RMSE, MAE, and MAPE.

Our objective of the script is to output a dataframe consisting of metrics for each row present in our dataset. Our workflow will then join this script to our original dataset and bring it together for us to use in Tableau Desktop.

RServe Script

1. Open an empty script in your favorite IDE. I’ll be building the script using R Studio.

2. Let’s start by defining what libraries we’d like to use. For this demo, I’ll only load two: tidyr and dplyr. Once you’ve installed these libraries (using the packages feature of RStudio, or via the “Install.packages” function), we can begin to build our encode function.

3. Our encode function houses most of the meat of the code. We’ll begin by removing any rows with blank entries that exist in this dataset. We’ll then take our dependent variable price, along with a few variables of interest, and an identifier column ID. We will call this new reduced dataset “dataset_reduced”. We then set our preferred data types within the script for the variables we selected.

4. We can now move on to building a matrix. Our matrix “data_model” transforms any variable that we set as factors into indicator variables. The model.matrix function does this for us. We will then convert the matrix back to a dataframe for ease of use later on.

5. The use of the model.matrix function creates another identifier variable. We don’t actually require an identifier variable for our regression, so we’ll drop both the ID and the identifier variable.

6. Alright! Let’s run our regression. We’ll be using price as our independent variable, and running it against everything else in the matrix. After we run our regression, we can grab the values we need to start building out calculations that demonstrate how well our model performs.

7. The following code calculates the error (original values - predicted values) at the row level. We then calculate the Residual Sum of Squares (RSS), which is simply our error squared. We then take the mean value of our dependent variable subtracted from each of our predicted values. We square this value to get our Explained Sum of Squares (ESS). Finally, we calculate Total Sum of Squares(TSS), which is simply the addition of RSS & ESS.

8. Our final dataset will contain our ID, fitted values, RSS, ESS & TSS. This is defined by our getOutputScheme function.

 

Building a Quick Tableau Prep Workflow

We can now build a flow which takes our dataset, integrates a script to output row-level calculations that we had built, and join it back to the original dataset. We can then output this file into a table or other data sources. We can also set our flows’ to run on a schedule, eliminating manual refreshes and allowing our tables to populate without interference. This in turn will automatically keep our dashboard up to date. For our demo, we’ll be outputting this file back into a csv that we’ll then open in Tableau Desktop.

  1. After opening Tableau Prep, navigate to the top of the screen. We want to navigate to Help > Settings & Performance > Manage Analytics Extension Connection. Change the default to Rserve, with server name being localhost and port 6311.

  2. We’re now ready to start our workflow: Load in our data by clicking the ‘+’ button, then clicking the “txt file” option and locating your newly created file.

  3. After the data appears in our workflow, we want to start by adding a script (for help here, see my blog post on using Rserve with Tableau Prep) . Once the script is added, make sure you have the script settings to run the function you defined in the script.

  4. We also want to add a join clause to the original dataset. On the join clause, make sure you’re joining the ID column from the results of our script to the ID column from the dataset. This should be an inner join.

 
Join Clause for ID columns
 

5. We can finally add an output statement and modify it to either output our results into a csv file or even write it to a table in our database. For this example, we’re going to output this into a csv file.

 

Visualizing Model Performance in Tableau Desktop

We can now open our output file inside Tableau Desktop. Connect to the file, and navigate to a new sheet. We’ll start by building a couple of metrics off of the columns we have from our workflow.

For the purpose of this demo, we will be building a calculation for R Squared, Adjusted R Squared and Root Mean Squared Error.

Because of the way Tableau calculates these functions, they can be used in conjunction with any filters you might add, allowing you to create a sliceable, interactive dashboard for your model users that can react their individual use cases! I’ve prebuilt a dashboard with some of these metrics calculated for demonstration that you can check out below. The map shows us the different zip codes, and clicking on a specific zip code filters the model performance metrics. Similarly filtering a grade updates the dashboard as well:

 
gif of final dashboard
 

We now have a dashboard that visualizes model performance! Similar to what we have done above, we could also calculate over/under predictions, MAE, MAPE etc.

Our script could also be updated to output performance metrics based on multiple models and/or different train/test splits.

With Tableau’s flexibility, and the right data setup, almost anything is possible!

  

Explore More

What is a Data Monetization? | Unlock Revenue with Data Portals

What is a Data Monetization? | Unlock Revenue with Data Portals

Everyone says, "data is the new gold," and there are a few ways to actually create revenue generation using insights. One such method to unlock the...

Read More
What is Embedded Analytics? | Deliver Insights Directly to People

What is Embedded Analytics? | Deliver Insights Directly to People

Technology has revolutionized how businesses operate, with data being at the forefront of this transformation. The ability to analyze data and...

Read More
Embedded Tableau Pricing | User vs. Usage-Based Pricing (UBL)

Embedded Tableau Pricing | User vs. Usage-Based Pricing (UBL)

Why Embedded Analytics with Tableau Embedded analytics is a growing use case for organizations looking to deliver (and even monetize) their data...

Read More