Amazon OpenSearch Service is a fully managed solution that enables you to securely deploy and manage OpenSearch clusters at scale within the AWS Cloud. This service allows you to configure your clusters with various node types, including data nodes, dedicated cluster manager nodes, dedicated coordinator nodes, and UltraWarm nodes. By selecting different node configurations for your OpenSearch Service domain, you can effectively manage the overall stability, performance, and resilience of your cluster.
In this blog post, we will explore how to improve the stability of your OpenSearch Service domain by utilizing dedicated cluster manager nodes and the positive impact they have on your cluster’s reliability.
The Advantages of Dedicated Cluster Manager Nodes
Dedicated cluster manager nodes are responsible for the crucial behind-the-scenes operations of running an OpenSearch Service cluster, without actually storing data or processing search requests. Without these dedicated nodes, data nodes would be tasked with both data management and cluster operations, potentially leading to performance and stability issues as data operations (such as indexing and searching) compete for resources with essential cluster management tasks.
The dedicated cluster manager node oversees several vital responsibilities: it monitors all data nodes within the cluster, tracks the number of indexes and shards, and routes data accurately. Additionally, it updates the cluster state whenever changes occur, such as creating an index or modifying nodes. However, heavy traffic can overload the cluster manager node, causing it to become unresponsive. In such cases, the cluster may fail to process write requests until a new cluster manager is elected, potentially repeating the cycle. By deploying dedicated cluster manager instances, you can separate these duties from data nodes, leading to a significantly more stable cluster.
Determining the Number of Dedicated Cluster Manager Nodes
In OpenSearch Service, a single node is selected as the cluster manager from all eligible nodes through a quorum-based voting system, which ensures consensus before taking on the responsibility of coordinating operations and maintaining the cluster state. Quorum refers to the minimum number of nodes that must agree before the cluster can make key decisions, helping to keep your data consistent and your cluster functioning smoothly.
When dedicated cluster manager nodes are utilized, only those nodes are eligible for election, with the quorum set to half of the nodes (rounded down) plus one. OpenSearch Service prohibits having only one dedicated cluster manager node due to the lack of a backup in case of failure. Using three dedicated cluster manager nodes ensures that even if one fails, the remaining two can still achieve a quorum and keep the cluster operational. We recommend deploying three dedicated cluster manager nodes for production scenarios. The Multi-AZ with standby feature of OpenSearch Service aims to provide four 9s of availability by employing a third AWS Availability Zone as a standby. This setup also requires three dedicated cluster manager nodes. If you opt for Multi-AZ without standby or Single-AZ, we still suggest utilizing three dedicated cluster manager nodes for backup and quorum purposes.
You can also choose to implement either three or five dedicated cluster manager nodes. While five nodes allow for the loss of two while still maintaining a quorum, only one dedicated cluster manager node is active at any given time, meaning you’ll be paying for four idle nodes.
Cluster Manager Node Configurations Based on Domain Creation Methods
This section outlines the resources deployed by each domain creation method when setting up an OpenSearch Service domain.
Using the Easy Create option, you can quickly establish a domain with ‘multi-AZ with standby’ for enhanced availability. This method includes three cluster manager nodes distributed across three Availability Zones, as summarized in the following table:
Domain Creation Method | Output |
---|---|
Easy Create | Dedicated cluster manager node: Yes Number of cluster manager nodes: 3 Availability Zones: 3 Standby: Yes |
The Standard Create option offers templates for both ‘Production’ and ‘Dev/test’ workloads. Both templates provide a Domain with standby and a Domain without standby deployment choice, summarized in the table below:
Domain Creation Method | Template | Deployment Option | Output |
---|---|---|---|
Standard Create | Production | Domain with standby | Requires dedicated cluster manager node Number of cluster manager nodes: 3 Availability Zones: 3 Standby: Yes Instance type choice: Yes |
Standard Create | Production | Domain without standby | Requires dedicated cluster manager node Number of cluster manager nodes: 3, 5 Availability Zones: 3 Standby: No Instance type choice: Yes |
Standard Create | Dev/test | Domain with standby | Requires dedicated cluster manager node Number of cluster manager nodes: 3 Availability Zones: 3 Standby: Yes Instance type choice: Yes |
Standard Create | Dev/test | Domain without standby | Does not require dedicated cluster manager node |
Selecting a Dedicated Cluster Manager Instance Type
Dedicated cluster manager instances primarily manage critical cluster functions like shard distribution and index management while monitoring cluster state changes. It’s advisable to choose a relatively smaller instance type. For further details about instance types for dedicated cluster manager nodes, you can refer to this valuable resource.
As your workload evolves, you may need to adjust the size and type of your cluster manager instances. Regular performance monitoring is essential to ensure sufficient CPU resources and Java virtual machine (JVM) heap for your dedicated cluster managers. We recommend utilizing Amazon CloudWatch alarms to monitor the following metrics and adjust as necessary:
- ManagerCPUUtilization – Maximum is greater than or equal to 50% for 15 minutes, three consecutive times
- ManagerJVMMemoryPressure – Maximum is greater than or equal to 95% for 1 minute, three consecutive times
Conclusion
Dedicated cluster manager nodes enhance stability and safeguard against split-brain scenarios. They can utilize different instance types than data nodes and provide significant benefits, especially when OpenSearch Service supports mission-critical applications in production settings. However, they are generally not necessary for development workloads, such as proof of concept, since the cost of maintaining a dedicated cluster manager node often outweighs the benefits of ensuring cluster uptime. For additional insights on OpenSearch best practices, you might find this blog post engaging.
Amazon IXD – VGT2
6401 E Howdy Wells Ave
Las Vegas, NV 89115
Leave a Reply