ETL Development & ETL Testing – a Pipeline for Data Warehouse Testing

ETL is a process of extracting (E), transforming (T) and loading (L) the data into the data warehouse or any other data-centric system. The process involved in developing these ETL processes is time consuming and usually follows the steps below:

  • 1. Understand the Business Requirement
  • 2. Create Technical Specification/ Requirement
  • 3. Generate Mapping Document
  • 4. Develop ETL Code

Figure 1: Standard ETL Development Pipeline Process

Standard ETL Development Pipeline Process Image-iCEDQ

The Issue: The ETL development process above is typically a complete black box. The business users or business analysts don’t really have any insights into the quality or correctness of this ETL process. Hence it is not only important to test this ETL Process, but ensure that the – ETL testing is transparent to the user’s data warehouse testing needs. Also, it’s not enough to test based on the transformation rules alone, since the technical transformation requirements might be incorrect and may not represent true business requirements.

iCEDQ suggests the following approach.

In addition to the ETL development process pipeline as described in the above section, we recommend a parallel ETL testing/auditing pipeline:

  • 1. Understand the business requirements of an auditing and data reconciliation point of view.
  • 2. Derive the audit and ETL testing requirements from the same core business requirements.
  • 3. Based on the audit requirements proceed to generate test cases.
  • 4. Each test case generates multiple Physical rules to test the ETL and data migration process.

Figure 2: Parallel Audit and Testing Pipeline

Parallel Audit and Testing Pipeline-iCEDQ

The combined ETL development and ETL testing pipeline are represented in the drawing below.

Figure 3: ETL Development vs. ETL Testing

ETL Development vs. ETL Testing Process Image-iCEDQ

The new combined pipeline offers many advantages to data warehouse testing.

  1. The data audit requirements are officially captured in a centralized repository.
  2. The focus is not primarily on data transformation, but to also question the quality of the data warehouse testing.
  3. Both the ETL and business Audit requirements are captured. Hence the tests are now more comprehensive.
  4. The ETL testing requirements are not an afterthought. This prevents the last minute surprises during QA & UAT.
  5. Business users are actively involved from day one, ensuring the validity of the data quality.
  6. The system is transparent, thus expectations are set in advance with the developers.

Summary

Employing and agile software platform to support this parallel pipeline approach not only makes the process more very effective. Such platforms are capable of performing these functions at a highly efficient rate and can save companies a tremendous amount of money and time in the long run. In today’s hyper-competitive world a cutting-edge data migration testing platform that performs these tests at an efficient rate is needed. iCEDQ or Integrity Check Engine for Data Quality is a leader in ETL testing and data warehouse testing.

iCEDQ is a quality assurance and test automation platform used for ETL testing/ Data Warehouse testing and Data Quality Governance.