Enhancing Bulk Data Import Performance to Amazon RDS for MySQL

By: Jamie Roberts

Date: 15 JUL 2022

Categories: Amazon RDS, RDS for MySQL, Technical How-to

When migrating or conducting load testing, bulk data loading is essential for transferring data between source and destination databases. However, importing large data sets into MySQL databases can take anywhere from several minutes to hours, and in some cases, even days, depending on the volume of data involved. While creating a dump file is relatively quick, the subsequent loading process can be time-consuming, causing delays in further operations planned on the specific data or database.

Several strategies can be employed to import data into Amazon Relational Database Service (Amazon RDS) for MySQL. The optimal method will depend on factors such as the data source, data volume, whether the import is a one-time event or ongoing, and the acceptable downtime for the business. For a comprehensive list of available techniques for importing data into Amazon RDS for MySQL, refer to Importing data into a MySQL DB instance.

This article focuses on enhancing the speed of loading data into an RDS for MySQL instance. The recommendations are divided into three main sections:

Key considerations for backing up the source database and importing data into an RDS for MySQL instance.
Database parameter modifications to enhance load performance.
Infrastructure-level adjustments you can make.

All suggestions in this article apply to Amazon RDS for MySQL versions 5.7 and 8.0. While the primary focus is on importing into a new database instance, many of these recommendations are applicable for batch jobs, bulk writes, or ETL processes. The source may be another database server, the same server, or a set of flat files.

Considerations for Backup and Importing Data

Here are some vital recommendations to consider while backing up data from the source database and loading it into the target database.

Physical Backup vs. Logical Backup

Depending on your use case, you may opt for either a physical or logical backup of your data. Physical backups consist of raw copies of the directories and files that hold database contents, making them suitable for larger datasets and allowing quicker, less intrusive backups. Percona XtraBackup can be used for physical backups on self-managed MySQL databases, but it cannot be used for RDS for MySQL instances.

In contrast, logical backups save information in a format that represents the logical database structure and content. This method is more appropriate for smaller datasets or cases where data values or table structures need to be edited. Tools like mysqldump or MyDumper can be leveraged for logical backups.

If you decide on a physical backup, you can utilize Percona XtraBackup and restore it to an RDS for MySQL instance via the Restore from Amazon S3 option. However, note that you cannot restore data to an existing RDS for MySQL instance, and the performance-enhancing parameter changes discussed in this article do not apply when restoring physical backups.

For detailed steps on performing a physical backup and restoring it to an RDS for MySQL instance, check out Restoring a backup into a MySQL DB instance.

If you opt for logical backups, consider the following recommendations:

Utilize Flat Files: Importing data from flat files can significantly enhance performance compared to SQL dump files. Use the LOAD DATA LOCAL INFILE statement to read rows from a text file into a table at high speed. Flat files can be generated on a per-table basis, facilitating parallel data loading. However, flat files cannot be generated directly from RDS for MySQL databases due to access restrictions. Alternatives include:

AWS Database Migration Service (AWS DMS): Migrate data to Amazon S3 using AWS DMS. When Amazon S3 is the target, both full load and change data capture (CDC) data can be stored in .csv format by default. For further details, refer to this blog post here.
MySQL Command Line Client: Create flat files using the command line. For example:

mysql -h <endpoint> -u <username> -p --silent --batch --quick --raw -e "select * from <database_name>.<table_name> limit 10" > <filename>.csv

Create Multiple Files for Parallel Loading: Instead of a single large file, consider splitting the backup into multiple files (one file per table). If tables are extensive, divide the table data into several files, ideally keeping each file under a high watermark of 500 MB. This strategy allows for parallel imports, significantly reducing overall data import time. For instance, create separate dump files for each table using:

mysqldump -h <endpoint> -u <username> -p <database_name> <tablename> > <dumpfilename.sql>

Then import using:

mysql -h <endpoint> -u <username> -p <database_name> < <dumpfilename.sql>

To create separate flat files for each table on non-RDS databases, you may use:

SELECT * INTO OUTFILE '/path/<tablename>.csv' FROM <databasename>.<tablename>;

The import command is then:

LOAD DATA LOCAL INFILE '/path/<tablename>.csv' INTO TABLE <databasename>.<tablename>;

You may also leverage third-party tools like MyDumper, which features parameters such as --rows or --chunk-filesize to assist in file splitting. For further information, check out their usage guide here.

Load Data in Primary Key Order: To speed up the insertion process, load data in primary key order, especially important for tables that cannot completely fit within the buffer pool. This can be accomplished with:

mysqldump -h <endpoint> -u <username> -p --order-by-primary <database_name> <tablename> > <dumpfilename.sql>

These strategies can substantially improve your bulk data import performance into Amazon RDS for MySQL. For additional resources and community discussions, consider checking out this Reddit thread.

Enhancing Bulk Data Import Performance to Amazon RDS for MySQL

Considerations for Backup and Importing Data

Physical Backup vs. Logical Backup

Related Topics:

Comments

Leave a Reply Cancel reply