In the previous blog post, the question was raised as to how to test an ETL process? So today we will talk about the basic concept of ETL testing and Data Warehouse testing.
The answer lies in the understanding of an ETL process. An ETL process at its core reads data, applies a transformation on it and then loads the data. This can be represented with the following simplistic equation.
Input Data + Transformation = Output Data
The above equation can provide us the information for creating physical tests to validate the ETL processes. We will use a simple example below to explain the ETL testing mechanism.
A source table has an individual and corporate customer. The requirement is that an ETL process should take the corporate customers only and populate the data in a target table. The test cases required to validate the ETL process by reconciling the source (input) and target (Output) data. The transformation rule also specifies that output should only have corporate customers.
Physical Test 1: Count the corporate customers from source. Count the customers from the target table. If the ETL transformation is correct the count should be an exact match.
This is pretty good test however the count might misguide if the same record is loaded more than once as it cannot distinguish between each customer.
Physical Test 2: Compare each corporate customer from the source to the customer on target. This kind of reconciliation can be done at the row or attribute level. Hence, it will not only prove the validation of counts but will also prove that the customer is exactly same on both sides.
There is various permutations and combination of these types of rules including advancing complexity. Also, this example proves that the concepts for ETL testing are totally different from that of GUI based software testing.