ETL Testing Platform For Data Lakes & Data Warehouse

Automate your ETL Testing process completely

ETL Mapping Process

Companies have thousands of data processing jobs running in their data centers, also known as ETL processes. These ETL processes read data from external or internal data sources, transform the data, and then load in a database or a data lake. With the advent of Big Data and cloud, more and more ETL processes are developed using Spark and Map-Reduce.

As the number of processes and data volume has increased exponentially, manual testing is almost impossible. Organizations are extensively looking DevOps, CI/CD and QA automation solutions to test and certify these ETL processes.

What is ETL Testing and how to Test an ETL Process?

A developer might use a GUI-based tool or hand-written code to build an ETL process. However, once the ETL process is developed, it is deployed as a background job. Because background jobs don’t have a GUI, QA automation products such as Selenium are useless for ETL testing. Please read more details about the differences between Software Application Testing and ETL Testing.

An ETL process can be visualized with the help of a simple equation.

Input Data + Transformation = Output Data

The quality of the output data depends on the

  • Quality of the Input Data.
  • Quality of the ETL Process (Transformation),
  • Proper Orchestration and Scheduling of the ETL processes.

It is obvious that the above three items must be properly tested to ensure successful implementation of the system.

Most of the tests involve proving that the data has been transformed correctly. Many patterns are involved in ETL testing:

  • Source Data Validation (Ensure data arrived correctly)
  • Source Target Data Reconciliation (Ensure the Source data matches the Target data)
  • ETL Data Transformation Reconciliation (Ensure the data is transformed correctly)
  • Business Data Validation (Ensure the data matches the business expectations)
  • Business Data Reconciliation (Ensure the data passes business reconciliation rules)
  • Orchestration script tests
  • For more details about all these tests read our article – How to Test ETL Processes?

Challenges in Agile Data Warehouse & ETL Testing

Testing Sample Data

When comparing data manually users can only test sample amount of data using excel or by eyeballing. Since the full dataset is not tested there is less test coverage as well as there is a high chance of encountering data issues in the production system.

Regression Testing

When testing an ETL process in an agile data warehouse manually it is very difficult to perform regression testing. As the number of ETL processes and their complexity increases more and more time and resources are required to test it.

Continuous Integration

Implementing continuous integration (CI) becomes impossible for customers in data-centric systems as they are testing the processes manually. To successfully do this they need to automate the testing of ETL or data integration process.

Disadvantages of Manual ETL Testing and the Need for Automation

Manual testing is bad, but manual ETL testing is even worse.

Manual Testing Challenges

Automated ETL Testing with iCEDQ

Productivity (Time, Money, Resources)

Manual SQL or Shell scripts are a waste of time and money. Plus, they require highly skilled resources.

On the other hand, iCEDQ has simple Drag and Drop GUI. Furthermore, iCEDQ’s rule-based approach is very fast and agile. Automation with iCEDQ can save over 70% of the effort.

Big Data Volume

Manual testing can only sample a few hundred records.

iCEDQ can handle millions or billions of records depending on the In-Memory Engine or the Big Data Engine running on the Hadoop Cluster. Learn more about Big Data Edition.

Heterogeneous Datasets

ETL (Extract, transform, load) by its nature, reads from one or more sources and writes to another. To check if the ETL process has correctly loaded the data, one must connect across two systems and apply rules. The source could be an Oracle database and the target could be an HDFS file.

The heterogeneous data source processes are impossible to test manually and require extensive programming.

Regression Testing

iCEDQ supports the creation of regression packs by combining newer or older rules and test changes in code. It can also repoint the rules to different environments such as DEV, UAT, and PROD, and still execute the same rules.

Compare this with manual testing. Once the test is done the test case and the code both is lost forever.

DevOps (CI/CD pipeline integration)

DevOps pipelines cannot be implemented without QA automation.

iCEDQ integrated with Jenkins or Bamboo continues the integration and deployment tools Git and JIRA. iCEDQ enables DevOps in data-centric projects. Please refer to the DevOps with iCEDQ page.

Test Coverage

There are obvious limitations on the number of test cases that humans can implement and test. With iCEDQ, you can have over 200% more test coverage.

Reusability

In manual testing, rules are created on a desktop and are lost once the testing is over.

Meanwhile, in iCEDQ, the data is stored in a centralized repository and, hence, can be reused at any time in the future.

Transparency & Reporting

iCEDQ Offers Custome Dashboard & Reports

Data-centric projects are expensive and long. Getting an accurate development status is crucial for any project execution.

iCEDQ has a built-in reporting tool and dashboard. iCEDQ provides the status and feedback on ETL development. This is because the test results can directly reflect the project’s progress.

ETL Testing Tool - iCEDQ

How does iCEDQ works?

At the core, iCEDQ is an Audit Rules Engine designed for Data Testing.

Once users create rules and execute:

  • A rule connects to multiple data sources
  • Brings the data in the memory or Hadoop cluster
  • Applies the expressions on the data elements
  • Identifies the data mismatches that do not satisfy the audit rule
  • Reports the rows and the data elements for further action
  • Provides summary reports for release by manager or management
iCEDQ ETL Testing Tool

iCEDQ ETL Testing Tool - Key Features

In Memory Rules Engine

Unique engine which compares and validates data in memory. Allowing users to compare millions of rows efficiently.

Collaboration

Having a Web-based GUI, global teams can work together and effectively carry quality assurance for the ETL or the agile data warehouse project.

Types of Rules

Users can create ETL reconciliation, Data Validation, Business Reconciliation & Validation rules. These rules can be used to compare Transformations, Conversions, Duplicates, Formats and different types of tests.

Advance Scripting

Users can write PL/ SQL like scripting object using groovy script. Workflows can be build around the execution of rules using the scripting object.

Integration

iCEDQ can be integrated with any continuous build integration tool, scheduling tool or an ETL tool. We also have an out box integration with HP ALM/ HP QC

Reporting

Pre defined charts and reports are available for users but if required custom dashboards can be built using your own reporting tools.

ETL test automation should be part of the DevOps from the beginning. It helps uncover underlying data issues and data processing issues in their early stages, define accurate delivery timelines, and provide extensive test coverage. Any delay in introducing QA automation to your ETL projects will result in not only delays but also project failures.

iCEDQ is listed in Top 4 ETL Testing Tools from Software Testing Help and

Best 5 ETL Automation Testing Tools from Guru99

Use Cases