Migrate Data from SQL Server to Amazon Redshift Using a Secondary Replica in an Availability Group with AWS DMS

Migrate Data from SQL Server to Amazon Redshift Using a Secondary Replica in an Availability Group with AWS DMSMore Info

In this article, we’ll explore how to leverage a secondary replica within a SQL Server Always On availability group as a source for migrating data into Amazon Redshift utilizing AWS Database Migration Service (AWS DMS). This approach minimizes the load on your primary replica, allowing it to handle more critical workloads.

Overview of the Solution

We will configure a two-node SQL Server 2019 Always On availability group on Amazon EC2, utilizing Amazon Redshift as the destination. AWS DMS will facilitate the migration of both existing data and incremental changes to ensure your target database remains in sync with the source.

To manage the availability group listener endpoint and replicas, we will use Amazon Route 53. For comprehensive instructions, refer to a detailed resource on self-managed SQL Server Always On availability groups available here. It’s important to note that AWS DMS version 3.4.7 introduced the capability to use secondary replicas as a source and supports connecting via VPC endpoints to various AWS services. If your AWS DMS replication instance does not have a public IP, ensure you configure a VPC endpoint for Amazon Redshift.

The following diagram illustrates the architecture and the data migration process through these key steps:

  1. AWS DMS connects to the SQL Server secondary replica to identify the Always On availability group topology.
  2. It then connects to the primary replica to perform necessary setups, such as verifying table publications used for replication and creating them if necessary.
  3. Data is extracted from the secondary replica and stored as CSV files on the AWS DMS instance.
  4. Once the current batch is complete, the CSV files are transferred to an Amazon S3 bucket.
  5. AWS DMS connects to Amazon Redshift to execute the COPY command with the relevant IAM role.
  6. Data is imported into Amazon Redshift from the S3 bucket to the target endpoint.

To implement this solution, follow these high-level steps:

  1. Set up the SQL Server Always On availability group secondary replica as the source.
  2. Configure Amazon Redshift as the target endpoint.
  3. Create a new AWS DMS instance.
  4. Establish the AWS DMS source endpoint.
  5. Establish the AWS DMS target endpoint.
  6. Create a full load and change data capture (CDC) task in AWS DMS.

Prerequisites

Before proceeding, ensure you have the following prerequisites in place:

  • An active AWS account.
  • A deployed SQL Server Always On availability group (either on-premises or on Amazon EC2).
  • Proper connectivity between the AWS DMS replication instance and the SQL Server source.
  • A SQL Server authenticated user (dbasa) with sysadmin privileges.
  • The ability to resolve the availability group listener and the DNS names for primary and secondary replicas.
  • Native SQL Server replication publishing and distribution enabled on the primary replica.
  • Distribution option enabled across all replicas in your availability group.

Please note that you’ll incur costs for the creation and use of new AWS resources, so it’s best to test this setup in a non-production environment first.

Configuring the SQL Server Availability Group Secondary Replica as the Source

The SQL Server Always On availability group feature ensures high availability and enhances read scalability. By using a secondary replica as the source for AWS DMS, you reduce the load on the primary instance, allowing it to focus on more critical read/write operations.

To configure the secondary replica as a source for AWS DMS replication, follow these steps:

  1. Log into your SQL Server instance.
  2. Open SQL Server Management Studio (SSMS) and connect using the availability group listener endpoint to ensure a connection to the primary replica.
  3. Navigate to the Always On High Availability folder, right-click on the availability group, and select Properties.

Next, you need to configure the Readable Secondary property for both replicas to “Read-intent only.” Keep in mind that allowing secondary replicas to serve read-only workloads could require additional licensing, so it’s advisable to consult your licensing team regarding any potential impacts.

For more details on creating read-only routing for your Always On availability group, you can check out this excellent resource. Here’s how to set up the read-only routing:

ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'YourPrimaryReplica'
WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));

ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'YourPrimaryReplica'
WITH (SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://YourPrimaryReplica:1433'));

ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'YourSecondaryReplica'
WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY));

ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'YourSecondaryReplica'
WITH (SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N'TCP://YourSecondaryReplica:1433'));

ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'YourPrimaryReplica'
WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('YourSecondaryReplica','YourPrimaryReplica')));

ALTER AVAILABILITY GROUP [YourAGName]
MODIFY REPLICA ON N'YourSecondaryReplica'
WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('YourPrimaryReplica','YourSecondaryReplica')));
GO

Following the configuration, your readable secondary replica will be prepared to handle read-intent connection requests from AWS DMS through the availability group listener endpoint.

For ongoing replication (CDC), AWS DMS reads from the active portion of the transaction log file for any changes. If the active log has been truncated due to a backup process, AWS DMS will look for log backup files, which must be in native format and accessible. Failure to read from either active or backup logs will result in task failure.

Setting Up Amazon Redshift as the Target

Amazon Redshift serves as a fully managed, petabyte-scale cloud data warehouse. To learn more about optimizing your setup, you can explore insights from this authority on the topic.

For additional context on this topic, check out another blog post at Chanci Turner VGT2.

By following these steps, you can effectively migrate your SQL Server data to Amazon Redshift while maintaining efficiency and scalability.

SEO Metadata


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *