Database Migrations to public cloud: How easy?

Database Migrations to public cloud: How easy?

frank-zomerdijk

Frank Zomerdijk

To migrate your transaction processing (OLTP) relational database to a public cloud solution sounds rather straightforward, but is it? Before you begin you should prepare yourself and answer some questions about the data to be transferred, how to transfer and the destination. I hope to give you some guidelines in this article.

Before you even think about moving a database to the public cloud, you should question yourself what would be the best solution in the public cloud of choice. Or better, understand your (clients) need and requirements for this data in the public cloud.

Of course, you can go for the easy way; find a similar solution in the cloud provider of choice. For instance, move from a MySQL database to AWS Aurora MySQL compatible or similar. But always question yourselves if there is not a better solution, can we not make the application more cloud native and make use of what other services the public cloud provider offers. NoSQL database other storage solution. This can save you data migration time (headaches) and make fully use of the solutions of your cloud provider (better, faster cheaper… etc.).

But I assume you considered this and made your choice what your destination is, now how do we migrate this data?

Source database

It is important that you know what you’re moving, what kind of data is stored in your database and do you have any “special” data stored. Some considerations:

Character sets

The data stored; what kind of character sets are used and does the destination has the same or a compliant character set. Most of the time there will be the same or a compliant character set. But if not, you are facing the challenge of changing data while migrating or update the data after migration. Some tooling support character substitutions while migrating the data. But be aware that this can take concededly amount of time to implement and test.

BLOB sizes of data stored

What are the (maximum) sizes of the data been stored. I am not talking about the total amount of data (database size), but the real data stored. What kind of data is this and what are the challenges here? Do not be surprised if images, PDF’s or videos are stored in a relational database. This binary data can easily be 10Mb to 100Mb or even bigger! Both destination and the data migration tools should be prepared for this. Some in memory processing will fail for large data sets and the destination should be prepared for this data (column and disk sizes). If you are facing big chunks of data been stored, ask yourself if this data should not be stored differently in the public cloud. Now is the time to change it.

Time zone differences

When you select a region to store our data, you should be aware that this destination can be in a different time zone than your source database. Set the time zone of your destination DB equal to your source DB or set these on your source and target connections. So, the data been transferred will be automatically updated during data migration.

Clean data

Some databases have a long history of data been stored, migrated, updated or even had some upgrades in their lives. This can cause data integrity issues or even data to be “corrupted” in the sense that it is not valid data anymore. An example is data been stored in a date/time field but containing zero’s for day or month or day (like ‘1986-00-00’). Be aware that this can occur. Hopefully you find these before or during migration via the validation process. These problems should be changed in the source database by the data owner, preferably before the database migration is stared.

Target database

Supported data types

If you have chosen a different database as your target database or a compatible cloud solution; double check the target datatypes. It can be that not all data types of your source will be available in the target database. So, some translation needs to be done during the data migration. This can lead to some (small) differences and the applications using this data should be prepared of this.

Secondly, the tools that your using to migrate the data, can use a different data type in the background for transfer or validation. This should not cause any data to be changed but can lead to (valse) data invalidation problems.

Foreign keys or cross references

If the data been transferred has foreign keys or cross references between tables, these tables should be migrated in a particular order. If our migration tools do not provide any order; then there should disable all foreign keys on the target till the database migration is complete.

Migration path

For a successful database migration there are some factors to be taken in account. Take your time to pick the right one, this can save time during database migration process.

Tooling

The tools your using is the key factor of a successful database migration. Think not only if these are proven tools, but also think about support during migrations. What should these tools do? Think of:

Database Schema Conversion - translation of tables, columns, data types, indexes, triggers, stored procedures, functions etc.

Data transfer – One-time (big bang) copy of data or also ongoing replication of changes?

Data validation – How to check if the data transferred is still in line with wat was in the source. Only a row count will not be sufficient. Data needs to be checked / validated.

How to transfer

The amount of data to be transferred will have the most important factor in how to transfer this data. Can it been done via a normal VPN connection or do we need more bandwidth? How long will it take to do a full data copy and do we want a content stop or ongoing replication. Most of the time a VPN connection will do, but double check the source data and calculate and test. And think of a plan B.

Data freeze

If you’re doing a onetime copy or ongoing replication; there will always be a (short) period where a switch over from source to destination requires a data freeze on the source.

If your using ongoing replication for a period; take care the destination data is not changed until the destination becomes the master. This means, do not run and code that change data to the destination database. No functions, no stored procedures, no triggers no batch jobs, until the destination becomes the master.

Performance and speed

Data migration tools can migrate data in chunks, test if your data matches the default chunk size and adjust accordingly. Also check the performance of source and destination database and a data migration instance (if used) during the process. Think of CPU, network, disk I/O, memory been used and the amount of concurrent connections to your database.

When migrating big amounts of data, it can be wise to create indexes in a later stage (when all data is transferred). This can speed up the data migration, but if you have validation of date been done at the same time or direct after the data is migrated. Having indexes on the tables will have a positive effect on the performance of the data validation. And most of the time there is data / tables where you can totally skip validation; think of tables that are are re-created overnight or are just storing temporary data (during import or batch jobs).

I hope that I give you lots of things to think about when you’re doing a database migration. If you need any help, please feel free to contact us. Sentia has more than six years of experience of moving data to the public cloud. We are most happy to assist you.