The team is usually divided into development, QA, operations and business users. In almost all Data Integration projects, development teams try to build and test ETL processes, reports as fast as possible and throw the code across the wall to the operations teams and business users. However, when the data issues start appearing in production, business users become unhappy. They point fingers at Operations people, who in turn point fingers at QA people. The QA group then puts the blame on the development teams.
All About ETL Testing, Data Migration & Data Warehouse Testing
The complexity of migration from a database to a new platform and time for implementing it has many challenges; Lack of strategy, incorrect assumptions, lack of tools, and complexity of the environment to name a few.
We at iCEDQ are not involved in choosing your new database platform. Choosing a database platform is beyond the scope of this article. We do want to help you by providing a checklist for your data migration effort and a data migration testing strategy.
Reports and dashboards are extensively used to make business decisions. These decisions later form the basis for the company’s growth and success. If BI reports are wrong, then decisions taken based on these reports will also be wrong. Inaccurate reports affect the organization’s credibility and also exposes them to compliance and legal issues. It can also lead to hefty fines that organizations can not afford to ignore BI Testing.
Information is the oil of the 21st century, and analytics is the combustion engine”- Peter Sondergaard, Senior Vice President, Gartner
Big Data and Business Intelligence are becoming an increasingly important source of statistical information which is used as a vital part of the critical decision-making process of all businesses. Bernard Marr, in his article titled “4 Ways Big Data Will Change Every Business” reiterates the industry-belief that “big data and its implications will affect every single business -from Fortune 500 enterprises to mom and pop companies – and change how we do business, inside and out.
Today all decisions in an organization are being made on the data available to them. Hence it has become critical to ensure that the data is free of any issues or defects. The way to ensure that there are no data issues and it is fit for the business, is to test, validate and compare it regularly.
Some of these organizations are either testing the data manually or not testing at all. We all are aware of the issues and challenges of testing anything manually but data testing has its own set of unique challenges on top of that. Below are some of the data testing challenges every organization encounters. The challenges mentioned below are for data testing which translates into Big Data Testing, ETL Testing, Data Migration Testing and few others.
AML software is a downstream system that consumes data from multiple sources. AML software analyzes data based on compliance models. This results in suspicious activity reports. Further, they also monitor data for regulations such as FATCA, trade restrictions, sanctions, and watch list.
However, if the incoming data is not good then these results cannot be relied upon. It is basically at the mercy of upstream systems.
iCEDQ is an in-memory data audit rules engine. It sits between the data sources and downstream systems such as ALM. It can validate and reconciliation data coming from multiple data sources. Thus, managing data issues before it affect the downstream system.
The Challenges: Today’s organizations have thousands of data integration (ETL) processes constantly moving silos of data from various operational and/or external data sources to downstream applications.
Since the downstream system doesn’t have control over incoming data or the process, it can cause serious data issues due to:
The quality of the data depends on the upstream systems,
The ETL jobs may not process the data correctly.
DataOps is a set of practices and tools used by Big Data teams to increase velocity, reliability, and quality of data analytics. It emphasizes communication, collaboration, integration, automation, measurement and cooperation between data scientists, analysts, data/ETL (extract, transform, load) engineers, information technology (IT), and quality assurance/governance. It aims to help organizations rapidly produce insight, turn that insight into operational tools, and continuously improve analytic operations and performance.
Perhaps the most astonishing fact, however, is that IT has been blind for so long to the need for monitoring and metering (Auditing) for data health, and yet this fundamental engineering concept. For instance, Figure 1 illustrates a centrifugal steam engine governor.
Quality Assurance (QA) is a very important component of any data-centric application project. Projects such as data warehouse, data migration, ETL, Data Lakes and MDM are no exception. The Majority of these projects are the multi-year and multi-million dollar in nature due to the amount of work and products required. Therefore it’s necessary to have proper planning for QA in place to avoid late discoveries of process and data errors.
While the methodologies of testing have evolved considerably over the years, the science of QA in data integration project has not. In this article, we’ll focus on some of the key challenges with data warehouse testing, data migration testing and ETL testing.