ETL Testing - Unit Testing vs. Quality Assurance for Data Warehouse

Normally, the ETL developers as part of the development effort will do unit ETL testing of the ETL processes. These test include some spot tests and summary tests. A unit ETL test is a test written by the programmer to verify that a relatively small piece of ETL code is doing what it is intended to do. They are narrow in scope, they should be easy to write and execute, and their effectiveness depends on what the programmer considers to be useful.

unit testing vs quality assurance for data warehouse

Only if the unit ETL testing is done by developers, then the project will face many fundamental issues and limitations such as the following:

  1. It breaks the 4-eyes principle. The four eyes principle is a requirement that two individuals approve some action before it can be taken. The four eyes principle is sometimes called the two-man rule or the two-person rule. Two activities are bound in a way that the person performing the first of these activities is not allowed to perform the second one also.
  2. The tests are based on the technical specs of the ETL process developed for the Data Migration or Data Warehouse project. These specs only have transformation rules and not ETL testing or ETL auditing requirements.
  3. The ETL developer is the tester, hence biased by his development knowledge.
  4. The ETL developer has limited QA toolset for ETL testing of the process.
  5. A unit ETL test does not have dependencies on ETL code outside the unit tested. You decide what the unit is by looking for the smallest testable part.

Beyond the core issues there are many types of tests and processes that can be missed due to limited unit ETL testing mandate of the developer:

  1. The quality assurance phase has a holistic approach towards ETL testing by involving integration testing, regression testing, functional testing, and user acceptance testing.
  2. The integration tests involve dependencies on outside systems beyond an individual ETL process. As an example, unit ETL testing will involve shipment transaction checking. Which might be correct by itself, but an integration testing has an extended scope of reconciling shipment transaction with orders. However, are there shipments without an order? Without orders, there can be no transactions.
  3. An integration test is done to demonstrate that different pieces of the system work together. If more than one process is running in the orchestrated manner and if any process is implemented in an improper sequence, then the result will be wrong. The sequence of execution tests is also an integral part of the integration tests. Integration tests cover entire applications, and they require much more effort to put together.
  4. In reality “integration test” gets used for a wide variety of things, from full-on system tests against an environment made to resemble production. While unit ETL test is just a limited test an individual process.
  5. Regression tests are essential when a new change is introduced in the processing system. The new change can be minimally or substantially impactful. Therefore, the introduced change can cause an issue in the result set. For example, if a new commission calculation is introduced, management should ensure all other attributes such as gross cost, fees remain consistent, but the commission amount should reflect the transition.
  6. Functional tests check a particular feature for correctness by comparing the results for a given input against the specification. Functional tests don’t concern themselves with intermediate results. For example, the data might be loaded in multiple steps. But from the functional test point of view, only the final dataset is meaningful. Is the shipment quantity in the final table matches to my total orders?
  7. One of the key differentiators is also the involvement of business users. The acceptance test suite is run against the supplied input data and verifies the end results. Many of the process requirement is documented. During user acceptance testing the business users can check if the business specifications are interpreted correctly by the technical specification and the code by verifying the final data results.

Beyond these fundamental issues, QA teams are better equipped, trained towards finding the flaws, have better processes and methodologies, and are empowered with purpose built ETL testing software like iCEDQ for Data Migration or Data Warehouse systems/ projects. This allows them to conduct full volume ETL testing instead of sampling. They can automate ETL testing. The tests are better documented and transparent to management and release managers thus lowering the risk of a data-centric project.