When you’re moving an on premise transaction processing (OLTP) relational database to AWS, there are a couple of “nice to knows”. This article will take you to a use case and through the challenges we faced and solved so you can benefit from this.
In this use case the customer wanted to migrate an on-premise MySQL 5.7 database to AWS. They did not want to change their application too much, so the database schema should be kept intact. Since AWS Aurora serverless did not support MySQL 5.7 (only 5.6) at that time, we decided to not use serverless but the normal AWS Aurora MySQL 5.7 compatible version.
Source database
Some background of the source database and the data to migrate.
- The database has 20 database schemas. Some schemas only contain a hand full of tables other containing more than 400 tables.
- Total data around 250GB; so easy to be migrated via a VPN connection.
- Minimum downtime and no “big-bang” approach; so, change data capture (CDC) is needed while all applications are moved to AWS (start with read-only applications).
- Some schemas contain functions, stored procedures, triggers and foreign keys.
- Some tables contain Large Object (LOB) data, maximum size around 20Mb. Most of the LOB data are less than 10Mb.
AWS tooling
Migrating your data to Aurora is one thing, this can even be done by a bulk copy from an S3 bucket to Aurora. But to have an ongoing replication, validation of data and taking care that functions, stored procedures and triggers are migrated is a bigger challenge. Luckily AWS takes away the ‘heavy load’ as they say, by supplying us with some great tools: AWS Database Migration Service and the AWS Schema Conversion tool.
These tools are well documented by AWS and if needed you can get help from AWS Business support. Tip: use the chat function when creating an AWS support ticket. For small questions or basic guidance you get directly an answer via chat.
AWS Database Migration Service (DMS)
The AWS Database Migration Service (DMS) can migrate your on-premise transaction processing (OLTP) database to AWS Aurora including change data capture (CDC) support. It is accessible via the AWS Console, but even better, use Infrastructure as Code (IaC) and CloudFormation to create your Database Migration Tasks and instances. For our customer we used AWS Cloud Development Kit in Python to automate this and use the same templates on all environments.
In the AWS Database Migration Service (DMS), multiple database migration task can be created, and each task can migrate several tables or whole database schemas. Some highlights of these database migration task:
- Mapping rules to include or exclude tables on their names - we used this to exclude some temporary tables or tables that where re-build overnight.
- Validation can be switched on or off per whole task
- A task has a source- and destination-endpoint. These endpoints can have optional settings, for instance to set a time zone or timeouts
- A task runs always on one database migration instance. To spread the load; create multiple tasks and run multiple on an instance depending on the size/load of the tasks.
- A task can drop the tables and re-create them when restarted. We did not use this option, since we are using the Schema Conversion Tool. We used the “truncate_before_load” option.
- A task has lots and lots of settings, play with these but note some settings cannot be set at the same time (LOBS are not supported when we set ‘ApplyErrorInsertPolicy’ of ‘ApplyErrorUpdatePolicy’ options). There are also some hidden (read: not documented) settings, if you encounter a specific error, AWS support will ask you to try these…
AWS Database Migration Service (DMS) limitations
- DROP and CREATE table are not supported: Our client application creates and drops temporary tables. Something normally done in memory, but our case tables where created and dropped when needed. To overcome this limitation of the DMS, we had to pre-fix these tables so we can exclude them in the table mappings (
tmp_%
).
In the tasks “Mapping rules (JSON)” this looks like:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "db_schema_name",
"table-name": "tmp_%"
},
"rule-action": "exclude"
},
{
"rule-type": "selection",
"rule-id": "2",
"rule-name": "2",
"object-locator": {
"schema-name": "db_schema_name",
"table-name": "%"
},
"rule-action": "include"
}
]
}
Note that the first (#1) rule excludes all tables that starts with the pre-fix tmp_
and the second rule (#2) includes all other tables.
- The data validation takes a lot longer than migrating the date itself. So, it makes sense to skip validation on tables that does not need any validation. This can be due to the nature of the data in these tables (log- or repeatable-data). Or better, maybe you can totally skip tables that are re-created by the application (nightly imported/recreated tables).
- Disable indexes on the target database can speed up the data migration process. But as already said: in our case the validation takes a lot longer than the data migration itself. So, we kept the indexes in the target database while migrating the data, so the validation process can benefit from these indexes.
- If you split a schema into multiple tasks, ensure you include all tables that participate in common transactions. Since AWS takes care that transactional consistency is maintained within a task, but not between multiple tasks.
- Data should not be changed in the target database, until you totally switch over to AWS Aurora with your application. So, disable all batch jobs, functions, stored procedures and last but not least all triggers on the destination (target) database. You can (re)create these after all database migration tasks are stopped and AWS contains the master database.
- Foreign keys: During the full load process, AWS DMS does not load tables in any particular order, so it might load the child table data before parent table data. As a result, foreign key constraints might be violated if they are enabled. You should disable foreign keys on the Aurora MySQL target database. You can enable the foreign keys on the target after the migration is complete. To do so, we set an target extra connection attribute:
initstmt=SET FOREIGN_KEY_CHECKS=0
- After migrating the data to AWS, the validation process starts. Sometimes this process fails on specific tables with the error
MISSING_SOURCE
orMISSING_TARGET
. Most of the times this is due to some delay between the full load and the validation. For us a reload of this tables always solved the issue. - For other best practises see the AWS Best Practises for DMS
AWS Schema Conversion Tool (SCT)
The AWS Schema Conversion Tool (SCT) is a free program that migrates your schema’s, tables, views, indexes, functions, stored procedures, triggers and foreign keys to AWS Aurora. It creates a nice report what is been migrated and if there were any errors that needs attention or functions / stored procedures that are not compatible with the source. It has a nice UI and is thereby easy to use, it can save you a lot of “hard work”.
Nice to know…
Some things you wish you had known before you started a task. For us this was the creating and dropping of temporary used tables. But also, some features in the Database Migration Service.
AWS DMS control and validation tables
The Database migration service instances cannot be accessed directly, but the logfiles of the tasks are written to CloudWatch log. And per task you can set specific parts of the task in a specific log level (info, debug etc). This makes I easy to debug what the task is doing and why it is failing.
If there are SQL exceptions during the migration, these errors will be written in the target database in the table awsdms_control.awsdms_apply_exceptions
. So always check this table for errors:
select * from awsdms_control.awsdms_apply_exceptions order by ERROR_TIME desc;
And if you encounter any validation errors, check the table awsdms_control.awsdms_validation_failures_v1
. It will contain the schema, table and primary keys of the row that encountered an error. The error type itself and the source and destination value.
select * from awsdms_control.awsdms_validation_failures_v1 order by FAILURE_TIME desc;
Fine grain validation
We encountered an error on timestamp columns, the data was transferred correctly but the data validation failed (hour or two extra). AWS support could not fix this problem in time, we set a lot of different value’s on source and target endpoint and finally asked support if we could disable the validation on a specific column / table or data-type. The answer was “no”… we are working on that feature request, can’t say when this is released. But when we got to second line support; they told us that it is possible! Here is how:
In the task “Mapping rules (JSON)” you can add an additional rule to override the default validation. Note that there are two functions, the source-fuction
and target-function
. Both functions need to have a variable ${column-name}
in it.
Fine grain validation - example 1
To change the validation on date/time values and only look at the day. This for all datetime columns (date in MySQL) in all schema’s and tables, you can do something like:
{
"rule-type": "validation",
"rule-id": "5",
"rule-name": "5",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%",
"column-name": "%",
"data-type": "datetime"
},
"rule-action": "override-validation-function",
"source-function": "date_format(${column-name}, '%Y-%m-%d')"
"target-function": "date_format(${column-name}, '%Y-%m-%d')"
}
Fine grain validation - example 2
To skip all validation on datetime columns of a specific table, you can do something like:
{
"rule-type": "validation",
"rule-id": "7",
"rule-name": "7",
"rule-target": "column",
"object-locator": {
"schema-name": "specific_schema",
"table-name": "specific_schema",
"column-name": "%",
"data-type": "datetime"
},
"rule-action": "override-validation-function",
"source-function": "case when ${column-name} is NULL then NULL else 0 end",
"target-function": "case when ${column-name} is NULL then NULL else 0 end"
}
I hope that I gave you some insights for a successful database migration into the AWS cloud. If you need any help, please feel free to contact us. Sentia consulting has more than six years of experience of moving data to the public cloud. We are most happy to assist you.