Netezza to Snowflake Migration Accelerated by 4.8X

Case Study

Netezza to Snowflake Migration Accelerated by 4.8X

Summary

TSB Drawing-03

When our client chose to migrate from Netezza to Snowflake data warehouse, their previous experience of migrating from Oracle 9 to Oracle 12 weighed in. It took them a year and a half to get the migration right and they did not want to go through the same mess again.

This case study compares two similar data migration projects, their contrasting experiences and outcomes. For the Snowflake migration, our client chose to automate data testing with iceDQ, a DataOps Platform for Testing and Monitoring.

TSB Drawing-03

Past Data Migration Experience

Organizational memory of past projects is important, and even Gartner will tell you that re-platforming and moving large volumes of data is a precarious proposition, especially if you do not conduct adequate testing.

TSB Drawing-03

The Previous Manual Data Testing Process

Client’s previous data migration process was manual. Our client wrote scripts in VB / Python to export data and then used Microsoft Excel to compare it. The tests creation, test execution, and inspection / consolidation of results post-test execution was also manual.

TSB Drawing-03

Lessons from the Previous Manual Data Testing Process

  • The manual effort to create the test cases was too much.
  • The project went woefully long, and even after going live, roughly 50 severe defects were discovered in production, and the team had to scramble to get the data warehouse available for business use.
  • The aggregate checks performed by the client were not adequate because it did not help identify the exact problem at the record level.
  • Our client unnecessarily faced many data problems even though the migration involved only a few minor changes.
  • The manual test execution for each release was slow, tedious, and very resource-intensive.

It was clear, for the next project, the client needed an automated data testing strategy.

Netezza to Snowflake Migration Project

Client’s Netezza data warehouse statistics are mentioned below:

Database Size: 17 terabytes overall
Number of Entities: 5000 objects in the warehouse.
4000 tables
1000 views
Largest Table: About 100 million records
Number of Subject Areas: 20 (Sales, Marketing, Financials, Customer support, Commissions, etc.)

TSB Drawing-03

Netezza Data Warehouse Architecture

Client’s current Netezza data warehouse architecture is Inmon like approach. There are three areas, including Staging, 3rd Normal Form Data Model, and Dimensional Data Marts. For performance reasons, the database views on top of the 3NF are physically loaded in the data mart.

TSB Drawing-03

Snowflake Specific Considerations

Even though it was a straight lift and shift, client realized that simply copying to Snowflake was not possible as there are few minor differences between Netezza and Snowflake database, including:

  • Datatypes are slightly different in Snowflake.
  • The way Snowflake handles its precision at the decimal is different as compared to Netezza.

Once the client mapped out the differences between Snowflake and Netezza databases and finalized the migration methodology, they purchased a new ETL tool for Snowflake Migration, and for data testing, they brought in iceDQ.

Why Automate Migration Testing with iceDQ?

For Snowflake data migration testing, the client chose the iceDQ DataOps Platform as it was a great fit for the client’s data test automation requirements due to the following features:

TSB Drawing-03

  • Realtime in-memory Engine: Unlike some other data testing tools, iceDQ does not stage data in a temporary database such as Postgres and then compare it. iceDQ makes the landing step totally redundant and hence the possibility of landing errors.
  • Cloud Connectivity: Our client not only used Salesforce but also built many apps on it. The client needed a data testing tool with SaaS connectivity and pre-built connectors to Salesforce, and iceDQ satisfied both these requirements.
  • Big Data: iceDQ could handle big data and Snowflake. Also, iceDQ is already a Snowflake Technology Partner.
  • RuleGen Utility: The migration involved 5,000 objects. To test them, the client would need 15,000 rules / test cases. Manual creation of these 15K rules would have been very time-consuming.

Snowflake Migration Testing with iceDQ

TSB Drawing-03

“MY DATA QUALITY TEAM WAS ACTUALLY ABLE TO KEEP UP WITH MY ETL TEAM, WHICH WAS FIRST TIME EVER.”

  • DDL Creation: The client built all the new DDLs (5000 objects) in Snowflake with required modifications.
  • Schema Compare with iceDQ: The client used iceDQ to ensure that every object / datatype was copied from Netezza to Snowflake.
  • Baseline Data Load: The client loaded the baseline data captured at a point in time from Netezza into the Snowflake database.
  • Auto-generate Tests with Rule: The client used iceDQ’s Rule tool to automatically generate database migration testing rules (15K).
  • Test Baseline Data: The client executed the test cases / rules created in iceDQ to validate every single object at the record and attribute level.
  • Re-write ETL: The client wrote and changed their ETL jobs to leverage the Snowflake connector.
  • Delta / Incremental Load: The client executed the new ETL jobs for incremental loading of the Snowflake database.
  • Delta / Incremental Test with iceDQ: Since the iceDQ rules already existed, it was very productive to re-use and run the regression packs as and when required for the incremental load.
  • UAT and Defect Analysis: About 7% of differences in data were found between Netezza and Snowflake. iceDQ easily pointed out the defects in data exception reports, helping identify issues such as data type differences, timestamp differences, and code conversion errors.

Testing with iceDQ

TSB Drawing-03

Future with iceDQ

Tactically, after Snowflake is live, the client will do some enhancements and changes to the warehouse. Client already has tests written in iceDQ; Client will re-use most of the already built test cases.

Strategically, our client is looking forward to implementing DataOps principles with the iceDQ platform.

  • DataOps: Many business and application teams rely on the data provided by the warehouse team for making business decisions. The client is planning to start an iceDQ DataOps COE (Centre of Excellence) to support other teams.
  • Test-driven Development: After acquiring Compuware, the client is integrating their data (contracts, LPs, bookings, commissions). They plan to introduce iceDQ into this project to perform data testing upstream at the application level, reducing the need for downstream validation in the data warehouse.
  • White Box Monitoring for Operations: For the operations team, the client aims to implement white box monitoring as part of their operations, data quality, and compliance strategy. This will include daily batch data monitoring, real-time data quality checks, and automated notifications.

About iceDQ

iceDQ empowers organizations to ensure data trust and reliability throughout the data life cycle.

Our comprehensive platform combines data testing, data monitoring, and data observability into a single solution, enabling data engineers to proactively manage data quality and eliminate data issues before they impact business decisions.

Leading companies across industries, including prominent players in banking, insurance, and healthcare, rely on iceDQ to continuously test, monitor, and observe their data-driven systems. This ensures trustworthy data that fuels informed decision-making and drives business success.

iceDQ Use Cases

  • Data Testing
  • ETL & Data Warehouse Testing
  • Cloud Data Migration Testing
  • BI Report Testing
  • Big Data Lake Testing
  • System Migration Testing
  • Data Monitoring
  • Data Observability

TSB Drawing-03

About the author

Sandesh Gawande

Sandesh Gawande is the Founder and CEO of iceDQ, a unified Data Reliability Platform for automated data testing, monitoring, and observability. With over 25 years of experience in data engineering and architecture, Sandesh has led large-scale data initiatives for Fortune 500 companies across banking, insurance, and healthcare, including Deutsche Bank, JPMorgan Chase, and MetLife.

Know More

Sandesh Gawande - CTO iceDQ

Sandesh Gawande

CEO and Founder at iceDQ.
First to introduce automated data testing. Advocate for data reliability engineering.

Share this case study