Best Practices for Migrating to Aurora MySQL

Data Modernization & Analytics
Migrations

Aurora MySQL is a high-performance, fully managed database with Amazon RDS benefits, simplifying infrastructure for business focus. Learn migration best practices and essential components for a successful journey toward Aurora MySQL that can lead to increased scalability, resiliency, and cost-effectiveness.

The number of data modernization initiatives continue to accelerate as customers are looking to take advantage of the benefits that AWS has to offer, specifically around database hosting on services like AWS Aurora. The opportunity of hosting your data platforms on RDS-like systems with improved scalability, resilience, cost savings and security is drawing more and more customers to explore this hosting option so they can focus on what actually matters, adding value to their business. In this blog we are going to outline some of the best practices that we as Cayliens can share for your migration journey to Aurora MySQL, as well as walk you through what components are required for a successful migration plan. 

Aurora MySQL is a fully managed, MySQL compatible database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. It’s a direct replacement for MySQL with the built in benefits of Amazon RDS (and more) for handling the infrastructure management so you can focus on your business and application.

A typical migration will be broken down into 4 phases, discovery, design, implementation and cutover. The discovery phase focuses on assessing your current environment and helping you plan for your future one by identifying suitable target infrastructure and any compatibility issues for your data and workloads. The design and implementation phases are typically iterative as you build and test your migration plan and make adjustments based upon your findings. Once you have a successful migration process in play, you are ready to execute the actual cutover to your new Aurora MySQL solution by creating the new infrastructure, and migrating the data and application workloads over. AWS provides a number of tools to help you simplify the process, but ultimately the success of this initiative will be dependent on how comprehensive your migration plan is.

When planning a migration, one of the first requirements is clarifying the location of the source database. If you are already on AWS, the migration can be simpler as there’s less compatibility issues to address and there are AWS tools available to streamline the process. However if your source database is on-premise or in another cloud-based environment, this will generally add complexity and often involves more planning and testing as you need to ensure the environments are compatible. Another major complexity when not already in an AWS environment is data transfer. If data is large this will be time consuming. If it needs to be transferred over a VPN for security concerns, you will need to setup the network infrastructure to migrate the data to AWS.

Regardless of your scenario, there are a number of best practices that you should be following to help guide you during this migration process, such as leveraging the built-in availability and durability of Aurora to choose the disaster recovery option that best serves your use cases, or optimizing large query joins by enabling hash joins to improve query performance. AWS provides extensive content on how to work with Aurora MySQL as well as a number of services such as AWS Database Migration Service (DMS) and AWS Schema Conversion Tool (SCT) to help you prepare for and accelerate your modernization initiative. 

Why migrate to Aurora MySQL?

Generally when thinking about what to do with your legacy, expensive MSSQL or Oracle workloads, the options end up between PostgreSQL or MySQL. While PostgreSQL is the “enterprise” and often complex answer, MySQL offers a great solution that has lower complexity and is more widely adopted so finding DBAs that know how to manage it is going to be easier. Furthermore MySQL is tunable at the storage engine level and has been optimized for low complexity, read heavy workloads. 

If you are running on AWS, the Caylent recommended approach would be either Aurora or RDS depending on your use case, but there are some edge cases that may require the use of either Aurora RDS Custom or EC2 database workloads so you have all your bases covered. Aurora has matured to the point where you can have either Aurora Serverless or global databases at the click of a button so it’s already feature rich, but if you need more flexibility RDS or EC2 options should be considered. 

There are a few other options out there for open source relational database targets however if you are looking to blend the best of both worlds (low Opex without licensing fees) for a low complexity and read heavy workload, Aurora MySQL is the way to go. 

Types of migrations

Where your source database is located and the type of engine are both key factors for your migration plan. If it’s already on AWS then you will have a number of additional options available which can help to streamline the migration process which we will dive into in more detail below. Alternatively, when the source database is on-premise or in another cloud-based environment, there will be an added layer of complexity involved which means additional time to plan and execute a successful migration to AWS. Some of the things that may impact this timeline are connectivity challenges, data sovereignty complications, or data movement at scale. Some of these challenges will also exist when migrating from AWS Region to region, however usually present themselves more frequently and at a higher complexity cost when migrating from somewhere else to AWS. 

When discussing types of database engines targeted for migration, there are homogeneous and heterogeneous migrations.  

Homogeneous data migrations refer to the process of transferring data between the same or similar database platforms, such as migrating from MySQL to Aurora MySQL. Although compatibility issues may arise during any migration, the number of schema conversion steps required is often significantly reduced in homogeneous migrations due to the similarities between the platforms.

Heterogeneous migrations differ as it involves transferring data to a different database platform, such as migrating from SQL Server or Oracle to Aurora MySQL. This type of migration is gaining popularity as customers seek to avoid vendor lock-in, reduce operating costs, and take advantage of AWS offerings. The caveat here is that heterogeneous migration introduces increased complexity as it involves schema transformation tasks and application changes to support the transition. AWS offers tools such as AWS Database Migration Service (DMS) and AWS Schema Conversion Tool (SCT) to assist with these tasks, but manual intervention may still be required for complex database and application logic.

Best practices for planning your migration

Planning the migration is the most critical aspect of this process, it will determine whether your migration into your new environment will be a success. You begin with a thorough assessment of your source infrastructure and data by analyzing application workloads and benchmarks, defining the scope of what data needs to be migrated so you can identify the data volume and object mapping needs, as well as completing a security assessment for your access controls, encryption and compliance requirements. All of this detail helps provide you with the information to identify a suitable migration method, timeline and target infrastructure so you can move your data accurately, securely, and with minimal disruption to operations. 

Perform a data assessment

Data profiling is the process of examining and analyzing your data so you can gain an understanding of what needs to be included as part of the migration. Are there any reformatting or restructuring requirements that need to be considered? Are there data quality or content concerns that you would like to address? Or is there a large amount of used or legacy data that should be archived as it can save you cost and time during the migration process? All of these are important for understanding what data is going to be needed so you can define a clear scope of what should be migrated.

One of the huge benefits of Aurora MySQL is its High Availability feature as it’s fault tolerant by design. This enables you to automatically copy your data across multiple Availability Zones for redundancy, host multiple DB instances within a region or even across regions with global databases, which helps ensure that one or more of your DB instances will be ready to handle requests from your application should a failure occur. Aurora allows you to create up to 15 read-only replicas (called reader instances) which can be automatically promoted to the primary or writer instance if it becomes unavailable for any reason. These read replicas can also be leveraged to reduce traffic to your writer instance by offloading any read-intensive workloads which require SELECT query access only.

Select Aurora hosting type

Once you have an understanding of your data, you can start looking at the infrastructure requirements for your new environment. Analyzing your source system’s current infrastructure and workloads is an ideal starting point, as there are multiple options for hosting Aurora that can save you costs, including serverless configurations for unpredictable or variable workloads or global databases for improved disaster recovery capabilities. Aurora offers numerous benefits around performance and scalability, with simplicity and cost optimizations so you can run your applications on AWS with ease, so use your current workloads as a blueprint to help identify the potential gains that you achieve when moving to Aurora.

Assess costs

As part of a data migration, you will likely have to put together a TCO (total cost of ownership) and an ROI (return on investment) for the migration project. The overall costs of your target solution will be determined based on which hosting option you choose.  

Aurora MySQL offers 3 options for hosting on AWS, provisioned on demand, provisioned reserved instance, and serverless so you can pick a hosting option that meets your cost objectives. Provisioned on demand lets you pay hourly with no long term commitments, these are only recommended for development, test or short-lived workloads since you are paying a much higher rate per hour. Provisioned reserved instances are more suitable for steady-state workloads and offer significant savings over the on-demand option as you commit to hosting the instance for a period of time. Finally the serverless configuration is an on-demand, auto-scaling configuration that will adjust database capacity based on your application needs. These are ideal for variable workloads and can help you save money during times of inactivity as the database capacity shuts down during these periods. 

In addition to instance configurations, there are database storage and I/O charges that also apply regardless of your configuration. These are calculated based on your storage capacity per GB and I/O rate per 1 million requests so it’s important to calculate these based on your current and forecasted workloads in your source environment.

Aurora also offers additional features that can add to your ongoing costs so make sure you factor those into your assessment. Features such as global databases for disaster recovery from regional-wide outages, snapshot or cluster exports to S3, and data transfer in and out of Aurora all incur additional costs and should be factored in. 

Assess security posture

This is a great opportunity to strengthen your security posture for your new Aurora environment. Aurora runs in a VPC to help you isolate your database in your own network, and you can completely customize your network security group settings to control access to your instances. It also supports security features such as encryption at rest using KMS and in transit using SSL (AES-256) so you can be confident your data is secure at all times. 

A really common Aurora scenario is serving your application traffic from a public facing subnet, while keeping your DB cluster and database in a private subnet for enhanced security. All Aurora instances are created with a private IP address which can serve traffic from within the VPC, but you can control whether the DB cluster has public access and if it will have a public IP address assigned. With public access disabled, you can customize your security group inbound rules and configure your Aurora cluster to be deployed in a private subnet without an internet gateway, which will ensure your instance data is hidden from the internet within your VPC.

You should also understand authentication and user permissions which go beyond your database level users and roles for your new environment, as Aurora integrates with IAM so you can control who has access to your instances, as well as monitoring options using CloudWatch and GuardDuty for auditing and threat detection. 

Define integration with other AWS services

There’s also an opportunity to enhance your capabilities beyond a standard database platform since Amazon Aurora seamlessly integrates with other AWS services such as S3 and Lambda. Importing and exporting data to S3 can be easily done with the S3 features available, such as importing data into Aurora for enrichment, or archiving historical data when it’s no longer needed. AWS Lambda functions can also be used to create real-time responses to events such as automatically loading a file from S3 when it’s uploaded, or triggering a notification when certain events occur within your database. 

Define cutover strategy

Lastly, you need to determine your cutover strategy, this is going to be driven by a few key factors. You always want to minimize disruption to your business operations, but zero disruption often comes at a higher cost around planning and resource allocation. Do you have a cutover window that can be leveraged on weekends or after hours, or when application traffic to your database is lower? If you do then you can look into utilizing that window with a big bang migration strategy to move all of the data over at once, otherwise you might need to investigate something closer to a blue green strategy which involves replicating data between the 2 instances until you are ready to switch over to minimize downtime. This option is especially useful if you are migrating from a MySQL instance already running on AWS, as you can configure your Aurora cluster as a read-replica until you are ready to cutover.

There are numerous other strategies here that haven’t been mentioned which are a hybrid approach of these, but regardless of what strategy you select you will want to use canonical names (CNAMES) for your databases as it allows you to switch traffic easier between the source and target databases when you are ready to cutover. 

Best practices for migrating data to Aurora MySQL 

There are numerous tools available that can be leveraged for migrating the data from your source into Aurora MySQL. AWS also offers a managed service called Data Migration Service (DMS) which is intended to help simplify the process of migrating data into Aurora MySQL. It has a user-friendly interface allowing easy configuration and validation, supports multiple data sources, continuous data replication and schema conversion tools (SCT) to help automate and minimize the effort involved in the migration process. AWS also provides a series of easy to follow workshops that you can reference to understand the process and features that the DMS tool can offer. 

If you are already hosting your MySQL instance in AWS, another option to consider is leveraging a read replica approach which we touched on briefly already. In this strategy, you can configure Aurora as a replica of your MySQL source instance, which will utilize the binary log replication feature to asynchronously replicate any updates from the source to your Aurora instance. When you are ready to cutover, you wait until the replica lag reaches 0, disable the source replication and promote your Aurora cluster to become the primary DB instance. Finish the process by redirecting your client applications to point to the new Aurora instance for your application workloads.

There’s also native options like the MySQL Data Export Tool, which can be leveraged to extract the data from your source platform in a variety of supported formats. This enables you to backup the data, upload it to a storage service like S3, and use the Aurora restore from S3 feature to import your data. 

Finally, there are a number of third party tools such as Pentaho Kettle, Qlik Replicate and Alooma which can be utilized and are typically much more advanced than native tools. These tools typically require additional licensing and can add complexities when using them since they offer more customizations and options for the migration. 

Best practices for developing and testing workloads

The key to validating workloads is to be able to set up a lower environment that mimics your production deployment as closely as possible. For your testing phase you will need to have some production benchmarks for all key workloads as a baseline. In your lower environment, you typically want to start with a pruned data set so you can validate your workloads are functioning correctly before you start your comparison. Keep in mind that the workload comparison may require a more complete data set from your source, so make sure you factor that in with your test results. The goal here is to identify items such as version compatibility or performance bottlenecks that will need to be addressed. 

In addition to testing workloads, you need to monitor system performance by defining a monitoring plan and testing failover scenarios. A robust monitoring plan should encompass baseline metrics, thresholds, and actions. This should include common metrics such as CPU, memory, disk space, client connections, network throughput, queue depth and IOPS for CRUD operations. For each metric, you want to determine what actions should be taken when these metrics fall outside of the acceptable range. This acceptable range will be based on your specific application tolerances.

A common problem we see when monitoring performance is connection churn which can create a slower response time from your database. A solution for this is to enable connection pooling, which Aurora supports through an RDS Proxy or any MySQL–compatible pooler. With this enabled, the connection pooler provides a cache of ready to use connections for clients to improve performance. 

Aurora also offers a number of stability tests that you can run to simulate failure events such as a disk failure or the crash of a reader or writer DB instance. These fault injection queries help you simulate a variety of failure events for a period of time that you specify, all without the event actually occurring in your Aurora instance. 

As far as troubleshooting your migration, AWS provides a great list of things to watch out for here.  

Best practices for monitoring results

When you are creating a cutover plan, you want to include details that will ensure your new environment matches the desired state prior to starting the cutover window. Inspecting object metadata is key to this, this will ensure your source and target databases align on tables, views and other objects like triggers and stored procedures. Don’t ignore some of the finer details such as primary and foreign keys, table constraints, attributes, and indexes. This will help you validate that your environment is ready for the most important phase of migrating the data. 

After the data migration has occurred, you need to ensure appropriate validation at the data level in order to confirm you have a successful cutover. Simple record counts are a great place to start, but looking at data integrity across key fields or data points is often overlooked but is crucial for your plan. 

When validating the cutover post migration, monitoring metrics and benchmarks should be used to ensure it’s operating as expected. Database and application baselines are useful, using metrics such as application & query times, connections, and operational metrics around CPU, memory, disk I/O, as well as log monitoring provide a robust toolkit that can be utilized after the migration has been completed. Don’t forget to set up cost monitoring using AWS CloudWatch and Billing metrics to ensure you don’t get any unexpected surprises over time. 

Enable backup and recovery

No solution is complete without a backup & recovery plan, and there are multiple options in place for how to achieve this. A basic approach is to utilize the automated nightly backups offered by Aurora, however many customers will want a more customizable backup strategy and are drawn to solutions such as a manual snapshot process or other services such as AWS Backup.

On the recovery side, creating a runbook to guide you through the recovery process is highly recommended, this helps to ensure your backups are actually achieving their purpose and can be restored successfully. In addition to this, simulating game days where you execute the recovery strategy is equally important, as it helps to ensure that you can comfortably meet your RTO and RPO objectives defined for your database and applications. 

How Caylent can help you migrate to Aurora

Now that you have an understanding of what a migration process to Aurora MySQL can look like, you can see it’s by no means a simple process but it’s certainly one that can provide you with immense opportunities as you continue on your data modernization journey.  

If you are looking to modernize your data infrastructure but are unsure where to start, get in touch with our data engineering experts at Caylent to see how we can help you to design and implement a scalable and performant Aurora solution. 

Data Modernization & Analytics
Migrations

Learn more about the services mentioned

Caylent Services

Data Modernization & Analytics

From implementing data lakes and migrating off commercial databases to optimizing data flows between systems, turn your data into insights with AWS cloud native data services.

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Related Blog Posts

re:Invent 2023 Data Session Summaries

Get up to speed on all the data focused 300 and 400 level sessions from re:Invent 2023!

Cloud Technology
Data Modernization & Analytics

Amazon Bedrock vs SageMaker JumpStart

Learn about the differences in how Amazon Bedrock and SageMaker JumpStart help you approach foundation model training for GenerativeAI Use cases on AWS.

Artificial Intelligence & MLOps
Data Modernization & Analytics
Video

Top 7 Cloud Migration Mistakes

Migrating to the cloud is deeply desirable due to ease of the management, scalability and many other factors, however poor choices in the migration process can lead to increased costs, poor performance and tech debt. Learn about the top 7 cloud migration mistakes and how to avoid them.

AWS Foundations
Migrations