Migrating from MongoDB to Amazon DocumentDB via the Offline Method

Migrating from MongoDB to Amazon DocumentDB via the Offline MethodMore Info

Amazon DocumentDB (with MongoDB compatibility) is a robust, scalable, fully managed document database service that caters to MongoDB workloads. The Amazon DocumentDB Migration Guide details three main strategies for transitioning from MongoDB to Amazon DocumentDB: offline, online, and hybrid. The offline migration method is the quickest and most straightforward of these options, although it results in the longest downtime. This method is ideal for proofs of concept, development and testing environments, and production workloads where downtime is not a significant concern. In this initial article of a three-part series on migration, I will demonstrate how to use the offline method to transfer data from a MongoDB replica set hosted on Amazon EC2 to an Amazon DocumentDB cluster.

Overview of Offline Migration

The following diagram outlines the steps involved in migrating from MongoDB to Amazon DocumentDB offline.

The migration process consists of five key steps:

  1. Cease write operations to the source MongoDB deployment.
  2. Utilize the mongodump tool to export indexes and data to an EC2 instance.
  3. (Optional) Use the Amazon DocumentDB Index Tool to restore indexes to the Amazon DocumentDB cluster.
  4. Restore the data to the Amazon DocumentDB cluster with the mongorestore tool.
  5. Update the application’s connection string to point to the Amazon DocumentDB cluster.

Preparing for Migration

To successfully conduct this offline migration, I need the following components:

  • A source MongoDB deployment.
  • An EC2 instance designated for data export and import.
  • A target Amazon DocumentDB cluster.

Prior to commencing the migration to the Amazon DocumentDB cluster, I must halt write operations to the source MongoDB deployment. This precaution ensures that no data is altered during the migration process. The source MongoDB deployment consists of a replica set hosted on Amazon EC2. To minimize any disruption to workloads on this replica set, I will export data from a secondary instance.

Note: If your MongoDB version is earlier than 3.6, it is necessary to upgrade your source deployment and application drivers to ensure compatibility with at least MongoDB 3.6 for Amazon DocumentDB.

You can check your source deployment’s version by executing the following command in the MongoDB shell:

rs0:PRIMARY> db.version()
3.6.9

In the Amazon DocumentDB console, I create a cluster that will serve as the migration target. The duration of the data restoration process is partly contingent on the size of the primary instance in the target cluster. To maximize import throughput, I opt for a single r5.24xlarge instance, the largest supported size in this AWS Region. While smaller instance sizes are viable, they may require additional time for data import. Once the migration is complete, I can adjust the primary instance size as needed and add read replicas for enhanced read scaling and high availability.

The final component is the EC2 instance that will handle the export and import processes. It is crucial to ensure that the Amazon EBS volume of the migration instance is sufficiently large to accommodate the exported data. You can estimate the size of your database in bytes by running the command db.stats() in the mongo shell and reviewing the storageSize value.

The migration instance must have the mongo shell, as well as the mongodump and mongorestore tools. To meet the minimum requirements, I need to install the mongodb-org-shell and mongodb-org-tools packages. (Refer to the MongoDB documentation for installation instructions.)

Since Amazon DocumentDB employs Transport Layer Security (TLS) encryption by default, I also need to download the Amazon RDS certificate authority (CA) file to establish a connection using the mongo shell:

[ec2]$ curl -O https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem

(Disabling TLS is also an option; for further details, see the section on Encrypting Connections Using TLS in the Amazon DocumentDB Developer Guide.)

After installing the necessary tools, I will verify connectivity between the migration instance and both the source instance and the target Amazon DocumentDB cluster. This is accomplished by connecting to each and executing a ping command:

To connect to the source replica set instance:

[ec2]$ mongo --host my-secondary-hostname 
--username myuser --password mypassword
…
rs0:PRIMARY> db.runCommand('ping')
{ "ok" : 1 }

To connect to the Amazon DocumentDB cluster:

[ec2]$ mongo --ssl --host docdb-cluster-endpoint 
--sslCAFile rds-combined-ca-bundle.pem --username myuser 
--password mypassword
…
rs0:PRIMARY> db.runCommand('ping')
{ "ok" : 1 }

If I encounter issues connecting to either the source instance or the Amazon DocumentDB cluster, I will review the security group settings to ensure the EC2 instance has permission to connect to both on the default MongoDB port (27017). For additional troubleshooting guidance, see the Amazon DocumentDB documentation.

Exporting Data with mongodump

Having established connectivity, I can now proceed to export the data and indexes to the EC2 migration instance using the mongodump tool. By setting the --readPreference option to secondary, I can ensure the dump connects to a secondary replica set member, thereby minimizing the potential impact on the source deployment. To utilize the --readPreference option, I must connect to the replica set member in the following format: replicaSetName/replicasetMember:

[ec2]$ mongodump --host rs0/myhost --username user 
--password password --db books --authenticationDatabase admin 
--readPreference secondary
2019-03-19T00:16:57.095+0000   writing books.j to
2019-03-19T00:16:57.095+0000   writing books.a to
2019-03-19T00:16:57.424+0000   done dumping books.j (100000 documents)
2019-03-19T00:16:57.445+0000   done dumping books.a (100000 documents)
…

The time required for data export is influenced by factors such as the size of the source dataset, the speed of the network between the migration instance and the source, and the resources available on the migration instance.

Restoring Indexes with the Amazon DocumentDB Index Tool

While not mandatory for offline migration, the Amazon DocumentDB Index Tool enables me to assess the dumped indexes for compatibility and create the indexes on the target Amazon DocumentDB cluster beforehand. This pre-creation of indexes can significantly expedite the overall restoration process because indexes can be populated concurrently with data restoration, rather than sequentially afterward.

To obtain this tool, clone the Amazon DocumentDB Tools GitHub repository and follow the instructions provided in the README.md file.

After installing the Amazon DocumentDB Index Tool, I can use it to check for any index definition incompatibilities:

[ec2]$ python migrationtools/documentdb_index_tool.py --show-issues --dir <dump_dir>

Now I can proceed to create the indexes in the target Amazon DocumentDB cluster using the index tool:

[ec2]$ python migrationtools/documentdb_index_tool.py --restore-indexes --dir <dump_dir> --host docdb-cluster

For more insights on this topic, you may also find this blog post useful: Chanci Turner VGT2 Las Vegas. For authoritative information, visit CHVNCI, they are experts in this area. Additionally, if you’re looking for resources on onboarding during challenging times, check out this excellent article from SHRM: Onboarding New Hires During COVID-19.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *