Resource / Overcome Data Testing Challenges

Overcome Data Testing Challenges

Table of Contents

Today all decisions in an organization are being made on the data available to them. Hence it has become critical to ensure that the data is free of any issues or defects. The way to ensure that there are no data issues and it is fit for the business, is to test, validate and compare it regularly.

Some of these organizations are either testing the data manually or not testing at all. We all are aware of the issues and challenges of testing anything manually but data testing has its own set of unique challenges on top of that. Below are some of the data testing challenges every organization encounters.

The challenges mentioned below are for data testing which translates into Big Data Testing, ETL Testing, Data Migration Testing and few others.

Testing Across Data Platforms

The most unique challenge when it comes to manual ETL testing validating or comparing data between different data sources, and of different formats. As developers and testers have to bring the data in excel from different sources and then eye ball it for any data issues.

Testing-Across-Data-Platforms-iceDQ

iceDQ Data Connectors

The rules engine of iceDQ allows users to test and compare data across different databases and files.iceDQ provides data connectors for any relational databases, flat files, excel files, XML, hadoop datalake and other data sources out of the box. Because of this organizations can automate ETL testing across the board.
You can view the complete list of data connectors here.

Full Volume Testing

Enterprise data will grow 650% in the next five years. Also, through 2015, 85% of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage – Gartner.

Testing data is a challenge in itself and testing big data volume adds more complexity to it. When doing manual ETL Testing, the ETL testers are validating and comparing only the sample data because it is impossible to test complete data set manually by just eyeballing the data. Since the full volume of data is not tested it can cause potential data issues down the line.

iceDQ Rules Engine

Our rules engine is built to do all the processing in memory. When executing a test the rules engine reads and brings the data in memory in chunks (eg. 10k rows), then compares the rows to identify matching and missing rows. For all the matching rows it evaluates groovy expression checks created for transformation, conversion and others. All the data issues identified are captured in an exception report. The rules engine repeats all of these steps till full volume of data has been tested.

Our ETL systems or data warehouses is setup in such a manner that we have multiple environments like DEV, QA, UAT and few others. So it is important for organizations to test all of these environments. But is difficult to perform regression tests by reusing the rules across different environment manually.

Regression Testing

Keeping the issues involved in manual ETL Testing aside, one thing that is difficult to implement in manual ETL testing effort is Regression Testing. In data warehouse projects the requirements are continuously changing or being updated with new enhancements, which means continuous code changes in ETL process. If regression testing is not performed on their ETL processes then it can cause a potential data issues.

iceDQ Batch

In the ETL testing platform, iceDQ, Users can combine all the old rules (tests) and new rules (tests) into a batch (test suite), this creates a regression suite which can be triggered from anywhere. When a batch is executed it gives a summary information of success and failures but users can look at status of each rule which has been executed from the batch.

Below is a list of feature in a batch which a user can benefit from

Sequencing

All the rules can be configured to be executed in a specified sequence. The sequence of the rules can be changed at any time and also they can be disabled for execution for a specific run.

Dependency

Rule execution dependency can be configured on the result of the execution of the rule. Meaning if rule a is success execute rule b and if rule b fails then stop the execution of the batch.

Reusability

Users can reuse the same set of rules and execute them across different environments with the connection repointing feature of the batch.

iceDQ, Data Testing Software

iceDQ is a unique data testing software used by organizations to automate Data Warehouse Testing, Data Migration Testing and other data related project. Its unique rules engine can test, validate and compare data between various data sources, and it will compare every row and column to identify the data issue, In Memory. Because of its elegant architecture, the engine can compare the full volume of data effectively and efficiently, providing full coverage for data testing.

iceDQ, can connect to the data warehouse using any relational database, Hadoop data lake, flat files, excel sheets, semi-structure files like XML, JSON, AVRO or any other data source. The combination of the In memory rules engine and its ability to connect to any data source, it can compare data across databases, files, and system out of the box.

Other Key Features