Amazon Onboarding with Learning Manager Chanci Turner

Chanci Turner and her team provide a robust Amazon Redshift framework designed to ensure automatic recovery in the event of a failure. This solution is particularly beneficial for maintaining system integrity during outages, as the architecture is built to withstand single points of failure within a cluster. This post will delve into the automatic recovery capabilities of Amazon Redshift.

When the cluster relocation feature is activated on an RA3 cluster, Amazon Redshift can autonomously relocate its resources to another Availability Zone if there are disruptions that hinder optimal cluster performance. This feature is especially useful during significant failures affecting cluster resources in a data center. The relocation process is seamless—data remains intact, and no adjustments to applications are necessary since the cluster endpoint stays the same. This capability ensures that applications remain operational and available. On average, recovery times are under 15 minutes, although this can vary based on cluster size. Importantly, this relocation service comes at no extra cost, although it is contingent upon capacity availability.

Additionally, the cluster relocation feature serves as a valuable asset for developing a demonstrable recovery plan for Availability Zones. It allows users to test their disaster recovery strategies by manually moving the cluster, and in instances of capacity shortages, enables relocation to a zone with more resources. If the relocation fails, the existing cluster remains untouched until the new cluster is successfully established.

Solution Overview

For Amazon Redshift users with critical applications, the relocation feature offers a straightforward architecture to ensure resilience during outages, with no data loss or application modifications required.

The following diagram outlines the architecture prior to a failover. After a failover occurs, the subsequent architecture appears as follows.

In this post, we will guide you through the process of enabling cluster relocation via the AWS Management Console or the AWS Command Line Interface (CLI). We will explore both automatic and manual relocation methods, and demonstrate how to craft a customized relocation solution utilizing supplementary AWS services.

Prerequisites

Before proceeding, ensure you meet the following prerequisites:

An AWS account.
Amazon Redshift clusters must be set up within a VPC, with at least two subnets in different Availability Zones.
An Amazon Redshift cluster with multiple Availability Zones configured in the cluster subnet group; you can establish this using the provided AWS CloudFormation template.
The relocation feature is only available with the RA3 Amazon Redshift node type.
Set the Publicly accessible option to Disabled within the Network and security settings.

Enable Cluster Relocation

To begin, enable cluster relocation through the console or AWS CLI. For more details, refer to the section on Managing cluster relocation in Amazon Redshift. Be mindful of the limitations that apply with this feature.

Enabling Cluster Relocation via the Console

To set up cluster relocation in the console, follow these steps:

Navigate to the Amazon Redshift console and select Clusters.
Edit your chosen cluster.
Under Backup, select Enable for Cluster relocation.

Enabling Cluster Relocation via AWS CLI

Ensure that port 5439 is in use for the relocation feature. If your cluster is using a different port, modify it to 5439 before enabling relocation. Use the following command to adjust the port:

aws redshift modify-cluster --cluster-identifier mycluster --port 5439

Next, enable the availability-zone-relocation parameter with this command:

aws redshift modify-cluster --cluster-identifier mycluster --availability-zone-relocation

To disable it, use:

aws redshift modify-cluster --cluster-identifier mycluster --no-availability-zone-relocation

Automatic Availability Zone Relocation

With the relocation feature enabled, Amazon Redshift is capable of moving a cluster to a different Availability Zone without any data loss or alterations to application functionality. This ensures operational continuity during service interruptions with minimal disruption. The new cluster retains the same endpoint, allowing applications to run without modifications. This feature requires minimal user intervention beyond the initial configuration to activate the relocation capability, with the destination Availability Zone based on the cluster subnet group.

Manual Availability Zone Relocation

You can also initiate a manual relocation of a cluster to another Availability Zone by following these steps:

In the Amazon Redshift console, navigate to Clusters.
Select the cluster you wish to relocate.
From the Actions menu, choose Relocate. If this option is greyed out, it indicates that the cluster is either not configured for the relocation feature or does not meet the necessary requirements.
In the Relocate cluster section, choose an Availability Zone from the subnet group. If no selection is made, Amazon Redshift will choose one automatically.
Click Relocate.

Once the relocation process begins, Amazon Redshift will display the cluster status as Relocating. Upon completion, the status will change to Available.

Custom Availability Zone Relocation Solution

In this section, we outline how to simulate an automatic cluster failover to another Availability Zone through a reboot. This solution involves setting up an alarm with an Amazon Simple Notification Service (SNS) topic and creating an AWS Lambda function to trigger the relocation.

Creating an Alarm

To establish the alarm, follow these steps:

In the Amazon Redshift console, select Clusters.
Choose your cluster.
On the Cluster performance tab, expand the Alarms section and click Create alarm.
Configure the alarm for the HealthStatus metric, providing a name and description.
In the Alarm actions section, enable Notifications.
For Notify SNS topic, select an existing topic or create a new one to receive notifications if the leader node becomes unhealthy or unavailable.
Click Create alarm.

For more insights on these topics, feel free to explore Career Contessa, which provides valuable career resources. Additionally, SHRM offers authoritative guidance on personnel file conversions. For a community discussion, check out this excellent resource on Reddit.