“Eliminate 99.99% of the Data issues before
they appear in Production.”
iCEDQ Manifesto: Why we built iceDQ Platform
“Eliminate 99.99% of the Data issues before
Everyone agrees that data is the new oil. Then why do many CIOs and CDOs wait for the data to fail in production? This fatalistic approach towards data integrity is beyond comprehension.
Many would argue that they have implemented a data quality solution in their production environment. But if it really worked, then why is it that the CISQ, (Consortium for Information & Software Quality) reported “For the year 2020, we determined the total Cost of Poor Software Quality (CPSQ) in the US is $2.08 trillion (T)”
Why production data is bad in the first place?
Lack of Data Testing in Development: Unlike application testing, companies don’t test data during development, thus causing the seepage of bad data into production.
Lack of Data Pipeline Testing in Development: Bad data processes results in bad data. Often companies ignore testing of the data pipelines in development. They fail to understand that the quality of the data pipelines is as much, if not more, responsible for the data quality.
Siloed Teams: The problem is further compounded by the siloed approach of data development and operations teams. We have observed that data quality is only implemented after the data is live in production environment, but not much data testing in development. It is a classic case of – too little, too late.
Lack of Data Monitoring in Production Environment: Companies don’t proactively monitor their data in production and the net result is that the data issues are not discovered until the users start complaining.
There are thousands of data pipelines moving data in the organizations, while they monitor the successes or failures of the processes they don’t check if the data was transformed correctly by the data processes.
No wonder so many of the data warehouses, data migrations and big data projects fail to deliver both, in terms of quality and in time. It is a classic case of – too little, too late.
The net result of lack of data testing and monitoring:
- Data issues are detected in operational environment instead of development.
- Often data pipelines are withdrawn back in development for re-engineering.
- It is very difficult and expensive to undo the layers of data processing in production due to bad data.
- When the issues are found in operations, damage to the downstream users and systems is already done.
iCEDQ’s DataOps approach to Data Quality:
Adopts DataOps to integrate Data Testing in Development and Data Monitoring in operations, shift as much work as possible to the left of the data development life cycle.
Focus on data testing and auditing during development phase and don’t wait for data to go live.
Fix The Process, Not The Problem
Bad data processes will give bad data. Testing data processes ensures that data pipelines do not introduce any more data errors.
Automate Data Testing
It is impossible to test millions of records manually. Invest in a purpose-built automated data testing platform such as iceDQ.
TTD - Test Driven Development
As the data mapping and data transformation are collected, get the data audit requirements also. This will allow both development and testing in parallel.
Some data issues cannot be detected without reconciling data with the data source as the data values correctness are often relative to another database.
Whitebox Data Monitoring
The integration of DEV and OPS teams will allow incorporation of data checks as part of the code so that they will be reused in production data monitoring.
Business Rules Based Audit Data
Monitor data based on auditing principles so that the data errors are captured before the users or business is impacted.
Monitor Data Pipelines
One of the key aspects is to ensure that the data processes are not introducing data errors. Successful data process coemption involves both completion of the process as well as the correct data transformation.
Establish Checks And Controls
Often the processes introduced data errors and if not stopped immediately will further complicate and cascade the data issues, which might be irreversible. Hence it is necessary to establish checks and controls in the process executions.
Link Business Processes To Data Audits
Simply knowing the data issues is not enough but link the data issues to the business process to gauge the actual impact on a specific business process.
Pinpoint Data Exceptions
Data audit must pinpoint the actual location of the data issue so that root cause analysis can be conducted.
Why you must act now?
Has your company invested in data testing? If application development requires testing, so do data centric projects. The Consortium for Information & Software Quality’s CPSQ-2020 Report further states that “defects that need to be corrected would be $1.31 Trillion”
It is no longer enough to simply develop data pipelines and dump them in production, it is imperative that Data Architects and Data Engineers must fully incorporate data testing in their development practices. Also, CDOs, Data Stewards and Compliance officers along with their business users should not wait for the data to be in production but get actively involved in development to ensure that the data processes are audited and certified prior to their deployment into production.
“We believe unified data testing and monitoring
not only reduces the development cost
but also eliminates data defects in production,
and that’s why we built iceDQ,
a fully integrated data testing and monitoring platform.”
Integrity Check Engine For Data Quality
Headquartered in Stamford CT, Torana Inc was established in 2005 by a team of Data Architects to solve various challenges faced by organization related to data centric projects and systems. In 2008 Torana established software R&D center in Nagpur, India.
We have a team of 120 developers, architects, analyst and consultants in USA and India.
We are deeply committed and invested in the success of our customers and partners. Torana has deep and inner understanding on the workings data centric systems such as Data Warehouse, ETL, MDM- Master Data Management, RDM-Reference Data Management Systems, CRM – Customer Relationship Management, CDP- Customer Data Platforms, MDW- Marketing Data Warehouse and RPA- Robotic Process Automation.