Migrating Database to Redshift, Snowflake, Azure DW and Test with iCEDQ
Many companies are migrating database actively for multiple reasons
- End of life (Ex. Netezza)
- Database cannot handle big data
- Companies moving the database to the cloud (ex. Redshift, Vertica, Snowflake, Azure, etc.)
- In-house database maintenance is becoming expensive
- License Cost for on-premise install
The complexity of migration from a database to a new platform and time for implementing it has many challenges; Lack of strategy, incorrect assumptions, lack of tools, and complexity of the environment to name a few.
We at iCEDQ are not involved in choosing your new database platform. Choosing a database platform is beyond the scope of this article. We do want to help you by providing a checklist for your data migration effort and a data migration testing strategy.
What is Involved in a Data Migration Project?
According to a Gartner, a staggering 83% of data migration projects either fail or exceed their budgets and schedules. Why is that? We believe the word data migration is misleading. People end up only focusing on pure data migration, which puts them in a false sense of security. Since people really don’t understand the scope of migration project the data migration testing part is also missed until it’s too late. There two basic components of data migration project like Netezza
- Actual Data Migration and
- Testing Data Migration
A. Data Migration Scope
Data migration is not just data but must be treated as an application migration. Because, data is not standalone, there are many other applications and systems that talk to the database and are reading or writing data to the database. Even if the database is a data warehouse, it will not only receive data but also have a big list of downstream applications reading data from the data warehouse.
Treat data migration as an application migration and database migration both.
Sandesh Gawande, CTO – iCEDQ
Data Migration Steps
- Migrating physical data structures
- Migration of stored procedures and functions
- Resetting of sequences (inside database /or outside the database)
- Initial data migration
- Incremental updates of data after the initial migration
- Downstream Application
- Upstream Application
- Replacing bulk data uploaders
- Replacing bulk data downloaders
- Backup and recovery procedures
B. Testing Data Migration
In data migration, a customer must move tables/ views/ sequences, underlying data, stored procedures/ functions and change the input of downstream data pipeline/ ETL processes as well as BI tools to the new data platform.
Step A – For the data migration to be successful, the first step is creating the database object in the new system similar to the legacy system.
Step B – Once the objects are created the next step is to ensure that the schema in the new system is consistent with the legacy system. This verification of the new schema can be done with iCEDQ.
Step C – If the schemas between legacy and new systems are synced up then the initial data upload is started, if not the schema creation process will be repeated to the point they are synced up.
Step D -After the systems are synced up and initial data is uploaded the next step is to reconcile the data between legacy and new system. If the data doesn’t match repeat the step where the initial data is uploaded. If the data matches between the system, then move ahead to the next steps.
Step E – Now the legacy and new systems are synced up the system needs to prove that given the same input the new systems produce the same result as the legacy.
Step F – This is checked, by running both systems in parallel and executing the same transactions in both systems. At this point, iCEDQ can reconcile data between both the systems to certify that the output matches the old system.
Step G – If the output matches after reconciling legacy and new system, the data migration is successful. If not, then delete the data and reinitiate the data upload and perform the tests again.
Appendix A: Database Migration Check List
|1.||Migrating physical data structures|
The process of migrating physical structure involves the following steps.
|a.||Map Legacy database datatypes with new database||[Legacy Datatypes?] == [New Datatypes]|
|b.||# of columns per table||Maximum columns per table: [?]|
|c.||# of columns per view||Maximum columns per view: [?]|
|d.||Column Name Restrictions|
Size [?] bytes
Reserved names [?]
Case sensitive [?]
|e.||Max row Size||[?] bytes|
|f.||Max number of characters in CHAR/VARCHAR||[?]|
|g.||Max length of the database name||[?] bytes|
|h.||SQL date function and other Netezza Built-in functions replacement||List [?]|
|i.||Stored Procedures, Procedural Language (PLSQL)||Rewrite the stored procedures written in legacy PLSQ to new database format if it is supported.|
|2.||Migration of stored procedures and functions|
Once the data structures are migrated the next step is to create equivalent procedural objects. The procedural objects need the data structures to be present otherwise they won’t compile
|3.||Resetting of sequences (inside database /or outside the database)|
Many times, primary keys and/or unique keys are assigned to the rows based on sequence generators. These sequences objects store the max key value internally in the database or external to the system. If the sequence is not taken care, then the application will not be able to enter new records on inserts.
|4.||Initial migration of data|
In this step, the frozen copy of data is migrated from source to target database.
|5.||Incremental updates after the initial migration|
The updates in the database can for two reasons
The downstream to the application could be reporting tools, Application, API calls, ETL tool that reads data and sends to other platforms
|7.||Upstream Application –|
Upstream applications can be ETL tools, Application feeding data directly via SQL, etc.
|8.||Replacing bulk data uploaders (Import)||Database have their native utilities to import data (ex. Oracle has SQLLDR) from the database. With the change in database vendor, the utilities must be replaced and/or written.|
|9.||Replacing bulk data downloaders (export)||Database has its native utilities to extract data (ex. SQL Server, SYBASE has BCP) from the database. With the change in database vendor, the utilities must be replaced and/or written.|
|10.||Backup and recovery procedures||This is mostly a DBA activity but doesn’t ignore.|
|11.||Setup data migration testing strategy||This is mostly a DBA activity but doesn’t ignore.|
- A Practical Guide for Data Centric Testing: Automated ETL Testing
- Overcome Data Testing Challenges
- Agile Data Warehouse Testing & Data Migration Testing
- Migrating Database to Redshift, Snowflake, Azure DW and Test with iCEDQ
- Data Migration Testing Techniques to Migrate Data Successfully
- The Data Migration Process & the Potential Risks
- ETL Development & ETL Testing – a Pipeline for Data Warehouse Testing
- ETL Testing and Data Quality Governance Software - The Missing Link
- DataOps Implementation Guide
- AML Software Implementation & Production Monitoring with iCEDQ DataOps Platform
- What Are The Challenges Of A Data Factory
- 3 Reasons Why You Need to Perform ETL Testing
- ETL Testing - Unit Testing vs. Quality Assurance for Data Warehouse
- ETL Testing Vs. Application Testing - The Fundamental Difference