Resource / Migrating Database to Redshift, Snowflake, Azure DW and Test with iceDQ

Migrating Database to Redshift, Snowflake, Azure DW and Test with iceDQ

Table of Contents

How to Migrate to Redshift, Snowflake, Azure Data Warehouse and Test with iceDQ

Many companies are migrating database actively for multiple reasons

End of life (Ex. Netezza)
Database cannot handle
Companies moving the database to the cloud (ex. Redshift, Vertica, Snowflake, Azure, etc.)
In-house database maintenance is becoming expensive
License Cost for on-premise install

The complexity of migration from a database to a new platform and time for implementing it has many challenges; Lack of strategy, incorrect assumptions, lack of tools, and complexity of the environment to name a few.
We at iceDQ are not involved in choosing your new database platform. Choosing a database platform is beyond the scope of this article. We do want to help you by providing a checklist for your data migration effort and a data migration testing strategy.

We at iceDQ are not involved in choosing your new database platform. Choosing a database platform is beyond the scope of this article. We do want to help you by providing a checklist for your data migration effort and a data migration testing strategy.

For a comprehensive understanding on migrating and testing your data warehouse to Snowflake, visit our dedicated resource: Snowflake Migration Testing

What is Involved in a Data Migration Project?

According to Gartner, a staggering 83% of data migration projects either fail or exceed their budgets and schedules. Why is that? We believe the word data migration is misleading. People end up only focusing on pure data migration, which puts them in a false sense of security. Since people really don’t understand the scope of migration project the data migration testing part is also missed until it’s too late. There two basic components of data migration project like Netezza.

Actual Data Migration and
Testing Data Migration

A. Data Migration Scope

Data migration is not just data but must be treated as an application migration. Because, data is not standalone, there are many other applications and systems that talk to the database and are reading or writing data to the database. Even if the database is a data warehouse, it will not only receive data but also have a big list of downstream applications reading data from the data warehouse.

Treat data migration as an application migration and database migration both.

Sandesh Gawande, CEO – iceDQ

Data Migration Steps

Migrating physical data structures
Migration of stored procedures and functions
Resetting of sequences (inside database /or outside the database)
Initial data migration
Incremental updates of data after the initial migration
Downstream Application
Upstream Application
Replacing bulk data uploaders
Replacing bulk data downloaders
Backup and recovery procedures

B. Testing Data Migration

In data migration, a customer must move tables/ views/ sequences, underlying data, stored procedures/ functions and change the input of downstream data pipeline/ ETL processes as well as BI tools to the new data platform.

Step A – For the data migration to be successful, the first step is creating the database object in the new system similar to the legacy system.

Step B – Once the objects are created the next step is to ensure that the schema in the new system is consistent with the legacy system. This verification of the new schema can be done with iceDQ.

Step C – If the schemas between legacy and new systems are synced up then the initial data upload is started, if not the schema creation process will be repeated to the point they are synced up.

Step D – After the systems are synced up and initial data is uploaded the next step is to reconcile the data between legacy and new system. If the data doesn’t match repeat the step where the initial data is uploaded. If the data matches between the system, then move ahead to the next steps.

Step E – Now the legacy and new systems are synced up the system needs to prove that given the same input the new systems produce the same result as the legacy.

Step F – This is checked, by running both systems in parallel and executing the same transactions in both systems. At this point, iceDQ can reconcile data between both the systems to certify that the output matches the old system.

Step G – If the output matches after reconciling legacy and new system, the data migration is successful. If not, then delete the data and reinitiate the data upload and perform the tests again.

Appendix A: Database Migration Check List

#No	Item	Comment
1.	Migrating physical data structures	The process of migrating physical structure involves the following steps. Create a list of physical objects with the type of objects. Create a mapping document for equivalent object names and data types. Create database scripts (DDL) to remove existing objects and create new. Execute the script. Create different environments by executing the scripts ex. Development, QA, UAT, Pre-Prod, System Integration, Production, etc. Create iceDQ rules to verify the objects that are created against golden copy. [Refer B ]
a.	Map Legacy database datatypes with new database	[Legacy Datatypes?] == [New Datatypes]
b.	# of columns per table	Maximum columns per table: [?]
c.	# of columns per view	Maximum columns per view: [?]
d.	Column Name Restrictions	Size [?] bytes Reserved names [?] Case sensitive [?]
e.	Max row Size	[?] bytes
f.	Max number of characters in CHAR/VARCHAR	[?]
g.	Max length of the database name	[?] bytes
h.	SQL date function and other Netezza Built-in functions replacement	List [?]
i.	Stored Procedures, Procedural Language (PLSQL)	Rewrite the stored procedures written in legacy PLSQ to a new database format if it is supported.
2.	Migration of stored procedures and functions	Once the data structures are migrated the next step is to create equivalent procedural objects. The procedural objects need the data structures to be present otherwise they won’t compile. Equivalent stored procedures. New Function in the support format. Create iceDQ rules to verify the objects that are created against source data. [Refer B]
3.	Resetting of sequences (inside database /or outside the database)	Many times, primary keys and/or unique keys are assigned to the rows based on sequence generators. These sequence objects store the max key value internally in the database or external to the system. If the sequence is not taken care of, then the application will not be able to enter new records on inserts. Make sure the max value is set properly that is equal to the legacy database or bigger than that. Create iceDQ rules to verify the sequence is equal or greater than the legacy database. [Refer D ]
4.	Initial migration of data	In this step, the frozen copy of data is migrated from source to target database. Stop any kind of processing in the source database. Or use redo logs and roll it over to a new instance of the legacy database. Find the exact date and time after which delta will be applied. This step is important as without this you will need to restore the complete database every time. Create extracts data at each table level and import it into the new database. Or write ETL process with prebuilt migration utility or industry standard ETL tool. Use iceDQ to reconcile source and target database to ensure the initial data migration is as expected. [Refer D]
5.	Incremental updates after the initial migration	The updates in the database can for two reasons Updates because the source data has changed, and the data needs to be updated. Or the upstream application feeding into the database is switched on and it has updated the new database. In either case, we need to verify that the database is still consistent with the legacy database. Use iceDQ to reconcile source and target database to ensure the initial data migration is as expected. [Refer F ]
6.	Downstream Application	The downstream to the application could be reporting tools, Application, API calls, ETL tool that reads data and sends to other platforms. Configuring timeout and connection pools Connectivity to database SQL Conversion
7.	Upstream Application –	Upstream applications can be ETL tools, Application feeding data directly via SQL, etc. Configuring timeout and connection pools Connectivity to database SQL Conversion
8.	Replacing bulk data uploaders (Import)	Database have their native utilities to import data (ex. Oracle has SQLLDR) from the database. With the change in database vendor, the utilities must be replaced and/or written.
9.	Replacing bulk data downloaders (export)	Database has its native utilities to extract data (ex. SQL Server, SYBASE has BCP) from the database. With the change in database vendor, the utilities must be replaced and/or written.
10.	Backup and recovery procedures	This is mostly a DBA activity but doesn’t ignore.
11.	Setup data migration testing strategy	This is mostly a DBA activity but doesn’t ignore.