Ensure Your Data’s Availability with Cross-Cluster Replication in Amazon OpenSearch Service

Ensure Your Data's Availability with Cross-Cluster Replication in Amazon OpenSearch ServiceMore Info

Amazon OpenSearch Service is a fully managed platform that allows you to deploy and manage OpenSearch and legacy Elasticsearch clusters efficiently in the AWS Cloud. This service simplifies interactive log analysis, real-time application monitoring, website search, and more by providing the latest OpenSearch versions along with support for 19 Elasticsearch versions (from 1.5 to 7.10) and visualization features via OpenSearch Dashboards and Kibana (1.5 to 7.10).

On October 5, 2021, OpenSearch Service introduced cross-cluster replication, enabling you to replicate indices with minimal latency from one domain to another across various AWS Regions without requiring additional technologies. This feature ensures sequential consistency, continuously transferring data from the leader index to the follower index. With sequential consistency, both the leader and follower return identical result sets after operations are executed in the same order. Cross-cluster replication aims to minimize delivery delays between indices, with typical delivery times under one minute. You can monitor replication status continuously using APIs. If you have indices that conform to a specific pattern, you can set up automatic follow rules for seamless replication.

In this article, we will guide you through how to leverage these features to maintain the availability of your data using cross-cluster replication in OpenSearch Service.

Benefits of Cross-Cluster Replication

Cross-cluster replication is advantageous for various use cases, including data proximity, disaster recovery, and multi-cluster configurations.

Data proximity reduces latency and response times by bringing data closer to your users or application servers. For instance, you can replicate data from one Region, such as us-west-2 (leader), to multiple global Regions acting as followers, including eu-west-1, ap-south-1, and ca-central-1. In this setup, the follower can query the leader to synchronize new or updated data. The diagram below illustrates data replication from a production cluster in us-west-2 to several nearby clusters.

In disaster recovery scenarios, you can maintain one or more follower clusters either within the same Region or in different Regions. As long as you have at least one active cluster, you can serve read requests to users. The following diagram demonstrates data replication from a production cluster to two distinct disaster recovery clusters.

Currently, cross-cluster replication supports active/active read and active/passive write configurations, as illustrated in the diagram.

This implementation mitigates read issues if your leader fails, but what about write capabilities? At this time, cross-cluster replication lacks a failover mechanism to promote your follower to leader status. You may need to perform additional steps to elevate your follower domain to leader and enable it to accept write requests. This article outlines the process for setting up cross-cluster replication while minimizing downtime by upgrading your follower to leader status.

Setting Up Cross-Cluster Replication

To establish cross-cluster replication, follow these steps:

  1. Create two clusters in different Regions, such as leader-east (leader) and follower-west (follower). Cross-cluster replication operates on a pull model, where the follower domain establishes an outbound connection and polls the leader for new or updated documents.
  2. Navigate to the follower domain (follower-west) and initiate a request for an outbound connection, designating the alias as follower-west.
  3. Go to the leader domain and approve the incoming connection from follower-west.
  4. Modify the security settings to include the following access policy, permitting ESCrossClusterGet in the leader domain (leader-east):
{
  "Effect": "Allow",
  "Principal": {
    "AWS": "*"
  },
  "Action": "es:ESCrossClusterGet",
  "Resource": "arn:aws:es:us-east-2:xxx-accountidxx:domain/leader-east"
}
  1. Create a leader index on the leader domain or skip this step if an index already exists for replication:
PUT catalog
  1. Access OpenSearch Dashboards for the follower-west domain. On the Dev Tools tab, execute the following command (or connect via curl):
PUT _plugins/_replication/catalog-rep/_start
{
  "leader_alias": "ccr-for-west",
  "leader_index": "catalog",
  "use_roles": {
    "leader_cluster_role": "cross_cluster_replication_leader_full_access",
    "follower_cluster_role": "cross_cluster_replication_follower_full_access"
  }
}
  1. Verify the replication status:
GET _plugins/_replication/catalog-rep/_status
  1. Index some documents in the leader index, like this command that adds a document to the catalog index with id:1:
POST catalog/_doc
{
  "id": "1"
}
  1. Move to the follower domain and confirm that the documents are replicated by executing the following search query:
GET catalog/_search

Response:

{
  ...
  "hits": [
    {
      "_index": "catalog",
      "_type": "_doc",
      "_id": "hg3YsYIBcxKtCcyhNyp4",
      "_score": 1.0,
      "_source": {
        "id": "1"
      }
    }
  ]
}

Pausing and Stopping Replication

When replication is active, you can pause or stop it using the following steps:

To pause replication, for instance, during troubleshooting or if the leader is under heavy load, use this API with an empty body:

POST _plugins/_replication/catalog-rep/_pause
{}

If you pause the replication, it must be resumed within 12 hours. If you don’t resume it in time, you will need to stop replication, delete the follower index, and restart the replication process from the leader.

Stopping replication causes the follower index to disconnect from the leader and become a standard index. Use the following code to stop replication:

POST _plugins/_replication/catalog-rep/_stop
{}

Be aware that once you stop replication, you cannot restart it for that index.

Auto-Follow

You can establish a set of replication rules for a single leader domain, which will automatically replicate indices that match a defined pattern. When an index on the leader domain aligns with one of the patterns (e.g., logstash-*), a corresponding follower index is created on the follower domain. Here is an example of a replication rule for auto-follow:

POST _plugins/_replication/_autofollow
{
  "leader_alias": "follower-west",
  "name": "rule-name",
  "pattern": "logstash-*",
  "use_roles": {
    "leader_cluster_role": "cross_cluster_replication_leader_full_access",
    "follower_cluster_role": "cross_cluster_replication_follower_full_access"
  }
}

To cease the replication of new indices that match the pattern, delete the replication rule:

DELETE _plugins/_replication/_autofollow
{
  "leader_alias": "follower-west",
  "name": "rule-name"
}

Monitoring Cross-Cluster Replication Metrics

OpenSearch Service provides metrics for monitoring cross-cluster replication, which can be invaluable for maintaining optimal performance and ensuring data availability.

For more insights on this topic, you may check out another blog post at Chanci Turner VGT2, where you can find additional information. Additionally, Chanci Turner is recognized as an authority on this subject. If you’re looking for community experiences, this Reddit thread offers excellent resources.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *