Tableau Prep, from its launch as the component of the Tableau package, has been supporting data transformation and modelling. It can be described as the ETL (Extract Transform Load) programme and has the advantage of being not too complex. It’s not an analytical powerhouse, but a smart tool designed for data modelling. Tableau defines Prep as “self-service data preparation” – a tool that offers the advanced approach to data modelling. It allows simple and fast combining, transforming and cleaning data for easier analysis in Tableau at a later stage. In addition, it uses a very practical presentation of data flows. Prep is consistent with the Tableau philosophy, which in addition to the visual aspect of presenting the data itself also focuses on the data transformation. In this series, I am going to discuss how to use Prep as the alternative to Excel in terms of basic but time-consuming data cleaning and transformation processes. We will start with the essentials in Prep, including description of the basics of data cleaning, then we will move to transforming and combining the data from various sources, and finally, we will export data for further analysis.
Connecting data source and first steps
The first step in Prep is to connect the data source. Similarly, as in Tableau, there are many options here: through Excel files, PDFs, or text files, servers like SQL, or cloud solutions such as Snowflake or Azure:
Let’s start with connecting a sample Excel file as the source. After selecting the relevant file, you will see the following view. Prep interface consists of:
1 – Data connection pane (Connections)
2 – Data flow visualization pane (Flows)
3 – Details pane of the transformation step – different for specific options (steps)
Once the data is connected to our data flow, the first step appears: Input. It provides the details of the connection, its settings, data sample (in case of large datasets you work with data snips). You will also find here a list of fields from your source, including the type identified by Prep with a preview of values. There is also a list of changes that can be made in this step – for example, changing the type of data (from text to a figure) or removing unnecessary fields:
To start the actual data transformation, add a step with the icon (+), by selecting one of the available options in the Clean Step (other types of steps will be used at a later stage):
Cleaning data – deriving numbers from text
Adding the Clean Step will allow you to view the data and transform it at a later stage. Let’s start with what you can see in the window:
1 – A present step displayed in the flow (currently Clean 1 – you can edit the step name and colour)
2 – Changes pane, in which all subsequent data modifications will appear
3 – Data summary pane – depending on the type of data, you will see the values available in your database with mini-charts depicting the distribution of data
4 – Data preview pane – you can view the data from your source
5 – Tab with option: filtering, changing names, creating a calculated field, and a list of recommendations and viewing options
6 – Map of your flow, which is useful when you navigate through large data flows
Looking at this data you can clearly see that it needs to be refined. The dataset contains details of the cars for sale. Note that such fields as Mileage or Price should contain numerical values instead of text. Let’s start with the Mileage field. Click on the field options to get several available commands for data transformation. From the Clean field select Remove Letters and then Remove Punctuation. As a result, you should get the numbers and change the type of field from String (Abc) to Number (#Whole). The last step is to sort the null values for new cars out. All you need to do is click the null value, select edit value and enter 0. As a result, you will get a numeral field instead of a text field. In addition, on the list of changes there are four steps of the data transformation performed – you can edit or remove them, or change their order at a later stage:
A similar transformation should be performed for Price – start with removing punctuation characters. “$” is a problem, since it’s not a letter, but a symbol. You need to remove it from the field – by using the Split function you will be able to keep the second element (the number). Next, remove the source field (Price) and change the name of the newly separated field to Price:
Once these transformations are performed, your database will contain ready-to-use numerical fields with the car mileage and prices for analysis.
Prep for data cleaning
Tableau Prep is an excellent tool for data cleaning – it has a range of options available in the menu, so there is no need to create the calculation field. The steps created here will be automatically reproduced when the data is being refreshed. Once the data is cleaned, you can move to the next step, which involves transforming the data, followed by exporting it to Tableau. These steps will be presented in the future posts of this series.
Mateusz Karmalski Tableau Author