By definition, self-service BI tools should provide the user with the ability to work with data independently. This means not only the analytical and visual-presentation layer, but also modeling the data before the actual analytical process. After all, the old analytics adage says that analysts spend 80% of their time collecting, preparing and processing data, and only 20% on the actual analysis and visualization. Business Intelligence software vendors are obviously aware of this – hence they supplement their flagship BI products with ETL add-ons.
ETL – what is it? Extract Transform and Load – that is, extract, transform and load data. It is one of the core elements of Business Intelligence processes. After all, in order to analyze data, we first need to acquire it, then transform it to the appropriate form, load it into the target environment to finally be able to analyze and visualize it.
The different phases mean:
Extract – extracting data from databases, warehouses, data lakes, clouds, files, tables Transform – processing data in order to be able to analyze it: cleaning, filtering, joining tables, etc. Load – loading it into the target environment: BI applications, but also, if necessary, again databases, warehouses, tables
ETL add-ons as part of a self-service BI package
The concept of ETL tools dates back to the 1970s and developed in parallel with relational database concepts. One of the first examples of languages that function as ETLs is SQL, or Structured Query Language for databases. Of course, a lot has changed over the years, and we don’t need to know programming languages to model data (although Python or R do apply here). ETL applications allow you to perform data transformations to the appropriate form through a graphical interface. Recent years have also seen intensive development of data visualization tools, and their manufacturers, wishing to provide users with the most comprehensive product possible, also include ETL add-ons within them. For Tableau, it’s Tableau Prep, and for Power BI, it’s Power Query. Both programs are part of the developer license, so there is no additional cost. This fact plus the integration with visual tools make it worthwhile to look into their functionalities and possibilities of application in our organization.
Tableau Prep – Visualize your data flow
Tableau Prep offers a similar interface to Tableau – so it will feel like home at first. In terms of the types of connections available, we have a wide range of options available – from csv or xls files, to warehouses, data lakes and cloud solutions. The real magic of Prep is revealed when you plug in your data and create a data flow:
Each element in the tide diagram represents one component of a given ETL process. Extract – data sources are added from Connections panel and have an icon on the diagram
Transform – here we have a number of options available when adding a new step:
- clean: filtering, replacing, data types, calculated fields, deleting, adding columns, actions on data etc
- aggregate: grouping measures by set dimensions (e.g., having sales data by days, we can add aggregations by months)
- pivot: convert columns into rows and vice versa
-join: join tables (left, inner, right, outer, etc)
- union: joining tables one under another (e.g. consolidating files of the same structure)
- script: calling a Python script
- prediction: calling prediction Einstein Analytics
As you can see, Prep is not only about data cleaning, but also more advanced applications thanks to support for Python scripts or Einstein Analytics. With this, machine learning issues also stand open for us.
Flow in Tableau Prep vs ETL elements
Decomposing the sample flow, we see the individual components of the ETL process:
You can also see here that ETL is not a process with a rigid structure. The different elements can interpenetrate, and the effect of one transformation (load) can be an input to the next (extract).
While Tableau Prep’s interface is very functional, it may initially take some getting used to. Tableau relies on visual communication and at every step we can see exactly what is happening with our data.
Below you can see a detailed view of the ‘clean’ step, which is cleaning the data. At the top (1) we have an overview of our flow at all times and can see which step we are currently creating. At the bottom we see:
- panel of changes (2) that we made to the data (such as filtering, deleting columns, calculated fields)
- summary of columns in our data (3) – data type, summary of values with graphs
-raw data (4) – table with preview of processed data
Of course, each step has a slightly different interface – for example, the Join step looks like this: All transformations end with the Output step, which corresponds to the Load phase of the ETL process. We can load data into a file (xls, csv, hyper), into a published data source on Tableau Server, or add it to a database/data warehouse:
In the next part of the post, we will take a closer look at PowerQuery and a summary of both tools.
Mateusz Karmalski, Tableau Author