Netezza to Snowflake Migration Accelerated by 4.8X
Case Study
Summary
When our client chose to migrate from Netezza to Snowflake data warehouse, their previous experience of migrating from Oracle 9 to Oracle 12 weighed in. It took them a year and a half to get the migration right and they did not want to go through the same mess again.
This case study compares two similar data migration projects, their contrasting experiences and outcomes. For the Snowflake migration, our client chose to automate data testing with iceDQ, a DataOps Platform for Testing and Monitoring.
Past Data Migration Experience
Organizational memory of past projects is important, and even Gartner will tell you that re-platforming and moving large volumes of data is a precarious proposition, especially if you do not conduct adequate testing.
The Previous Manual Data Testing Process
Client’s previous data migration process was manual. Our client wrote scripts in VB / Python to export data and then used Microsoft Excel to compare it. The tests creation, test execution, and inspection / consolidation of results post-test execution was also manual.
Lessons from the Previous Manual Data Testing Process
- The manual effort to create the test cases was too much.
- The project went woefully long, and even after going live, roughly 50 severe defects were discovered in production, and the team had to scramble to get the data warehouse available for business use.
- The aggregate checks performed by the client were not adequate because it did not help identify the exact problem at the record level.
- Our client unnecessarily faced many data problems even though the migration involved only a few minor changes.
- The manual test execution for each release was slow, tedious, and very resource-intensive.
It was clear, for the next project, the client needed an automated data testing strategy.
Netezza to Snowflake Migration Project
Client’s Netezza data warehouse statistics are mentioned below:
Database Size: | 17 terabytes overall |
Number of Entities: | 5000 objects in the warehouse. 4000 tables 1000 views |
Largest Table: | About 100 million records |
Number of Subject Areas: | 20 (Sales, Marketing, Financials, Customer support, Commissions, etc.) |
Netezza Data Warehouse Architecture
Client’s current Netezza data warehouse architecture is Inmon like approach. There are three areas, including Staging, 3rd Normal Form Data Model, and Dimensional Data Marts. For performance reasons, the database views on top of the 3NF are physically loaded in the data mart.
Snowflake Specific Considerations
Even though it was a straight lift and shift, client realized that simply copying to Snowflake was not possible as there are few minor differences between Netezza and Snowflake database, including:
- Datatypes are slightly different in Snowflake.
- The way Snowflake handles its precision at the decimal is different as compared to Netezza.
Once the client mapped out the differences between Snowflake and Netezza databases and finalized the migration methodology, they purchased a new ETL tool for Snowflake Migration, and for data testing, they brought in iceDQ.
Why Automate Migration Testing with iceDQ?
For Snowflake data migration testing, the client chose the iceDQ DataOps Platform as it was a great fit for the client’s data test automation requirements due to the following features:
- Realtime in-memory Engine: Unlike some other data testing tools, iceDQ does not stage data in a temporary database such as Postgres and then compare it. iceDQ makes the landing step totally redundant and hence the possibility of landing errors.
- Cloud Connectivity: Our client not only used Salesforce but also built many apps on it. The client needed a data testing tool with SaaS connectivity and pre-built connectors to Salesforce, and iceDQ satisfied both these requirements.
- Big Data: iceDQ could handle big data and Snowflake. Also, iceDQ is already a Snowflake Technology Partner.
- RuleGen Utility: The migration involved 5,000 objects. To test them, the client would need 15,000 rules / test cases. Manual creation of these 15K rules would have been very time-consuming.
Snowflake Migration Testing with iceDQ
“MY DATA QUALITY TEAM WAS ACTUALLY ABLE TO KEEP UP WITH MY ETL TEAM, WHICH WAS FIRST TIME EVER.”
- DDL Creation: The client built all the new DDLs (5000 objects) in Snowflake with required modifications.
- Schema Compare with iceDQ: The client used iceDQ to ensure that every object / datatype was copied from Netezza to Snowflake.
- Baseline Data Load: The client loaded the baseline data captured at a point in time from Netezza into the Snowflake database.
- Auto-generate Tests with Rule: The client used iceDQ’s Rule tool to automatically generate database migration testing rules (15K).
- Test Baseline Data: The client executed the test cases / rules created in iceDQ to validate every single object at the record and attribute level.
- Re-write ETL: The client wrote and changed their ETL jobs to leverage the Snowflake connector.
- Delta / Incremental Load: The client executed the new ETL jobs for incremental loading of the Snowflake database.
- Delta / Incremental Test with iceDQ: Since the iceDQ rules already existed, it was very productive to re-use and run the regression packs as and when required for the incremental load.
- UAT and Defect Analysis: About 7% of differences in data were found between Netezza and Snowflake. iceDQ easily pointed out the defects in data exception reports, helping identify issues such as data type differences, timestamp differences, and code conversion errors.
Testing with iceDQ
Future with iceDQ
Tactically, after Snowflake is live, the client will do some enhancements and changes to the warehouse. Client already has tests written in iceDQ; Client will re-use most of the already built test cases.
Strategically, our client is looking forward to implementing DataOps principles with the iceDQ platform.
- DataOps: Many business and application teams rely on the data provided by the warehouse team for making business decisions. The client is planning to start an iceDQ DataOps COE (Centre of Excellence) to support other teams.
- Test-driven Development: After acquiring Compuware, the client is integrating their data (contracts, LPs, bookings, commissions). They plan to introduce iceDQ into this project to perform data testing upstream at the application level, reducing the need for downstream validation in the data warehouse.
- White Box Monitoring for Operations: For the operations team, the client aims to implement white box monitoring as part of their operations, data quality, and compliance strategy. This will include daily batch data monitoring, real-time data quality checks, and automated notifications.
About iceDQ
iceDQ empowers organizations to ensure data trust and reliability throughout the data life cycle.
Our comprehensive platform combines data testing, data monitoring, and data observability into a single solution, enabling data engineers to proactively manage data quality and eliminate data issues before they impact business decisions.
Leading companies across industries, including prominent players in banking, insurance, and healthcare, rely on iceDQ to continuously test, monitor, and observe their data-driven systems. This ensures trustworthy data that fuels informed decision-making and drives business success.
iceDQ Use Cases
- Data Testing
- ETL & Data Warehouse Testing
- Cloud Data Migration Testing
- BI Report Testing
- Big Data Lake Testing
- System Migration Testing
- Data Monitoring
- Data Observability
Share this case study