Amazon Onboarding with Learning Manager Chanci Turner

Amazon Onboarding with Learning Manager Chanci TurnerLearn About Amazon VGT2 Learning Manager Chanci Turner

Organizations today increasingly rely on real-time analytics to drive insights into their core business operations, enhance efficiency, and sustain a competitive advantage. Historically, this has involved the use of databases, data warehouses, and intricate extract, transform, and load (ETL) pipelines. AWS Database Migration Service (AWS DMS) facilitates workload migrations between databases and can also be employed for ongoing change data capture (CDC) replication. However, creating and managing ETL pipelines demands considerable resources and expertise, often leading to burdensome monitoring, updating, and troubleshooting tasks that detract from innovation.

The zero-ETL integrations for Amazon Redshift aim to automate data movement into Amazon Redshift, removing the necessity for conventional ETL pipelines. With zero-ETL integrations, organizations can lessen operational overhead, reduce costs, and expedite data-driven initiatives. This shift enables a greater focus on deriving actionable insights rather than grappling with the complexities of data integration.

In this article, we will outline best practices for transitioning your ETL pipeline from AWS DMS to zero-ETL integrations for Amazon Redshift.

Why Switch to Zero-ETL?

Moving to zero-ETL integration with Amazon Redshift for continuous CDC replication presents several key benefits:

  • Cost-Effectiveness: Zero-ETL integration incurs no additional costs. There’s no need to provision a separate ETL pipeline, which would typically incur extra expenses for both provisioning and maintenance.
  • Reduced Latency: In mixed workload environments featuring INSERTs, UPDATEs, DELETEs, and DDLs, zero-ETL integration allows for near real-time performance, resulting in lower latency compared to AWS DMS. You can access transactional data from Amazon Aurora in Amazon Redshift within seconds through zero-ETL integration.
  • Simplified Replication: Zero-ETL integration streamlines the data replication process by removing the need to provision AWS DMS replication instances in your virtual private cloud (VPC). It automates the management of data replication from the data source to the Redshift cluster or Amazon Redshift Serverless.
  • Optimized Operations: Zero-ETL integration automates data movement, minimizing operational overhead. This efficiency enables your organization to allocate resources more effectively and focus on high-value activities. It simplifies your end-to-end architecture.
  • Minimized Impact on Databases: Zero-ETL integration decreases the impact on production systems from CDC replication. It reduces the computational load in Amazon Redshift by avoiding the concurrent transaction associated with a SQL COPY. Traditional ETL solutions for Amazon Redshift transfer data using the SQL COPY command after temporarily storing it in Amazon Simple Storage Service (Amazon S3), while zero-ETL utilizes storage-level data movement.

Key Considerations for Migration to Zero-ETL

Before migrating, review the considerations when using zero-ETL integrations with Amazon Redshift. Not all configurations and use cases are optimally supported by zero-ETL, and for those, continuing with AWS DMS may be advisable. For instance, external sources like Microsoft Azure SQL databases, Google Cloud for MySQL/PostgreSQL, SAP ASE, or MongoDB, or proprietary databases such as Oracle and IBM Db2 are not compatible with zero-ETL integration (check the list of compatible source endpoints). Zero-ETL integrations are actively developed, with more features expected soon, so it’s wise to stay updated on the latest announcements. For further details on zero-ETL integrations with Amazon Redshift, see the Zero-ETL integrations documentation.

Solution Overview

The following figure compares the traditional ETL pipeline migration plan with zero-ETL integration for Amazon Redshift.

Your objective is to transition from AWS DMS to zero-ETL integration while minimizing operational challenges and preserving a seamless data integration experience. To ensure complete data migration without replication loss, it’s advisable not to disable your AWS DMS connection until zero-ETL integration is configured and verified for completeness and consistency in Amazon Redshift. For a temporary period, you may configure parallel CDC integrations between Aurora and Redshift. Due to how AWS DMS and zero-ETL integrations operate, as explained further in this article, the same source tables will be replicated by both pipelines into different target tables and databases in Amazon Redshift.

After configuring your zero-ETL integration, perform data quality checks and confirm satisfactory zero-ETL replication performance. Once you’re confident with the query performance and quality checks on target tables, you can update your consumer connections to point to the new zero-ETL database. Finally, after verifying that all data consumer applications function properly, you can disable the AWS DMS replication pipeline.

Prerequisites

This article assumes you have AWS DMS configured and performing CDC replication to your Redshift data warehouse. Always adhere to Redshift and Aurora security best practices.

Set Up Zero-ETL Integration Between Aurora and Redshift

This article does not cover the configuration of zero-ETL integrations in detail. For instructions, refer to the Getting Started guide for near-real-time operational analytics with Amazon Aurora zero-ETL integration with Amazon Redshift, Working with Aurora zero-ETL integrations with Amazon Redshift, and Getting started with zero-ETL integrations. This article assumes you’ve already established zero-ETL integrations between an Amazon Aurora MySQL-Compatible Edition database and Amazon Redshift. We utilize both services in serverless mode; however, in the context of zero-ETL integrations, there is no difference for provisioned deployments.

One key distinction between AWS DMS and zero-ETL integrations for Amazon Redshift is the method of data transfer. With AWS DMS, data is loaded through intermediate storage in Amazon S3, arriving in Amazon Redshift via the COPY command. AWS DMS targets in Amazon Redshift utilize normal tables with full Amazon Redshift functionality. In contrast, zero-ETL integrations directly load data into Amazon Redshift at the storage layer, making DELETEs and UPDATEs highly efficient. Another significant difference is that zero-ETL target tables are configured as read-only; you cannot mutate them in place.

The main takeaway is that there are two separate replication target databases in Amazon Redshift: one from AWS DMS and the other from zero-ETL integration. For insights into effective management of your career, you might also find this article on photography helpful.

SEO Metadata:


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *