Data Warehouse Testing
Make sure the sanctity of your data is preserved and
Data Warehouse is a repository of historical data that is used by organizations across the world to make educated decisions. Typically, a data pipeline collects data from multiple databases, files, and other data sources, transforms the data, and loads it into a data warehouse on a regular interval.
Organizations use reporting tools on top of a data warehouse so that business users, analysts, data scientists, and others can analyze the data for reporting and decision making.
Therefore, to ensure that the business users are making accurate decisions, it is very critical to be sure that data in the data warehouse is correct. So, to achieve that, organizations have to test their data warehouse end to end.
Data Warehouse Testing Challenges
Below are some of the critical challenges in testing a data warehouse.
- A typical data warehouse will contain millions of rows, and testing a data warehouse with a high volume of Data is very challenging. Further, doing data sampling for testing is not at all useful as it will never cover all the scenarios, and users will see data issues in production.
- Comparing data across heterogeneous data sources is a critical challenge in Data Warehouse Testing. In the absence of a Data Warehouse testing tool, users have to bring the data into an excel file or eyeball the data to identify issues by comparing manually. And doing so is highly erring.
- Testing across multiple environments of the data warehouse like DEV, QA, UAT, and making sure Data is valid across all, is time-consuming and may cause duplication of work.
- The inability to identify data issues on time due to the lack of data warehouse test automation can cause problems in downstream applications.
- Testing a Business Intelligence (BI) Report with a data warehouse or data mart table is not straight forward. Because there is no standard connector for the BI reporting tools, users have to manually export the data from the BI report to validate it. This task again adds a manual effort to testing.
Type of Data Warehouse Testing Checks
Source Data Validation
This test is to ensure that the data in the source table or file does not have any Data Quality or Data Accuracy issues.
Example: The user wants to check the incoming customer feed file for the below tests before the ETL process picks it up for loading.
- The CUSTOMER_ID column cannot be null and should be numeric
- The DOB column should have valid dates
- The STATE column can only have a specified list of state codes
Based on the result of this type of Data Warehouse test, users can decide if the ETL process should be triggered or not. Ensuring that either ETL process does not load incorrect data or it does not fail because of Data Quality issues.
Source to target reconciliation
This Data Warehouse test is for Data Completeness between source and target when there is not transformation involved, and it is a one to one load.
Example: Ensure all the data in the Customer feed file is loaded into the staging table completely.
- Identify all the CUSTOMER_ID’s which are missing from the staging table
- Identify all the records where NAME, GENDER or DOB column data is not matching between the file and the table
By comparing the 100% of the data between the source and target, the user can identify any Data Completeness issue in the Data Warehouse and fix the ETL process accordingly.
Testing the Data Warehouse for any Data Transformation, Conversion, or Calculation issues is crucial to ensure Data Completeness. Users have to review the source to target mapping documents to identify different types of tests.
Example: Test if the Daily Sales Summary calculation ETL process has transformed the data correctly
- Sum of today’s transaction quality and the amount should match the daily sales summary data
Users can compare the full volume of data or do the delta for everyday test runs.
The purpose is to ensure that data is not breaking any business rules even if the ETL has loaded the table successfully.
Example: Make sure that NET_AMOUNT = GROSS_AMOUNT – (COMMISSIONS + TAXES + FEES) in the Sales table.
Testing for Business Rules is essential as they are independent of the ETL process itself.
Ensuring that data between two subject areas is consistent and is not breaking any business rule is one of the Data Warehouse testing check users perform.
Example: The Order and Shipment ETL process is loading the ORDER and the SHIPMENT table, respectively. The business rule states that a shipment can exits only if an order is successfully processed.
- Identify any shipment(s) which does not have an order or has an invalid order.
Five things to look for in Data Warehouse Testing tool
High data volume testing
A Data Warehouse is synonymous with a large volume of data, and therefore it essential for any Data Warehouse Testing tool to test 100% of the data. Testing 100% of the data will increase organizations’ trust in their data.
There are 100’s of Data Warehouse processes, and any small change in these processes can cause data issues in downstream systems. Therefore a tool to do Regression testing in a Data Warehouse will ensure that new code change will not break old functionality, and at the same time, the new feature is working as expected.
Test across data sources
Various upstream data sources like Oracle, Netezza, Redshift, Flat Files, and others are feeding into a Data Warehouse continuously. And as part of testing the Data Warehouse, a tool must connect to all these different data sources, be it on-prem on in the cloud.
Organizations have tools for test management, defect management, scheduling, and notifications. They want to integrate Data Warehouse test automation tool with these tools for end to end automation or a productive workflow. So the ability to integrate is vital in an enterprise landscape.
A Data Warehouse testing tool should enable its users to create any custom report as per requirement whenever needed. It should be able to integrate with external reporting tools like Tableau, Power BI, etc.
Use iCEDQ for Data Warehouse Testing
- Test 100% of the data in your Data Warehouse and identify any data issues using our in-memory engine or the Apache Spark engine. Read More
- Users can connect to different data sources, be it a database, file source, or a reporting tool, and validate and compare data across them. Read More
- Create test suites for regression testing, release management testing, or for reusing the same Data Warehouse tests across multiple environments.
- Integrate iCEDQ with your test management tool like HP ALM, ticketing systems like JIRA, or CICD tools like Jenkins using our Rest API or CLI utilities. Read More