Amazon IXD – VGT2 Las Vegas: Optimizing Costs and Enhancing Resilience for EKS Using Spot Instances

Amazon IXD - VGT2 Las Vegas: Optimizing Costs and Enhancing Resilience for EKS Using Spot InstancesMore Info

This article is contributed by Jamie Lee, Sr. EC2 Spot Specialist Solutions Architect. Leveraging Amazon EC2 Spot Instances to run your Kubernetes and containerized workloads can significantly reduce costs. Kubernetes, a widely-used open-source container management platform, enables the deployment and management of containerized applications at scale. AWS simplifies Kubernetes operations with Amazon Elastic Kubernetes Service (EKS), a managed service designed for running production-grade workloads. To optimize costs, utilizing Spot Instances is highly effective, as they offer discounts of up to 90% compared to On-Demand pricing. These instances are particularly suited for fault-tolerant applications that are flexible in terms of instance types. The synergy between Spot Instances and containers is profound, given that containerized applications frequently exhibit statelessness and flexibility in instance requirements.

In this article, I will outline best practices for effectively utilizing Spot Instances, including diversification, automated interruption management, and the use of Auto Scaling groups to ensure capacity. We will then tailor these Spot Instance best practices for EKS in order to enhance resilience and optimize costs for containerized workloads.

Overview of Spot Instances

Spot Instances represent excess Amazon EC2 capacity, allowing users to save significantly—up to 90%—compared to On-Demand rates. Spot capacity is categorized into pools based on instance type, Availability Zone (AZ), and AWS Region. Spot pricing fluctuates slowly, guided by long-term demand and supply trends for specific Spot capacity pools.

When AWS requires the capacity back, Spot Instances receive interruption notifications, which can be found in both EC2 instance metadata and EventBridge. After a two-minute warning, the instance is reclaimed. You can set up your infrastructure to automate responses during this interval, such as draining containers, managing ELB connections, or performing post-processing tasks.

Flexibility in instance types is crucial when adhering to Spot Instance best practices, as it allows provisioning from various Spot capacity pools. Utilizing multiple Spot capacity pools can reduce interruptions based on your defined Spot Allocation Strategy and speed up capacity provisioning. For instance, if your application is deployed across two AZs using only one instance type, you limit yourself to two Spot capacity pools. By diversifying across six AZs and multiple instance types, you can expand to 24 Spot capacity pools, thereby improving your application’s stability and resilience.

Auto Scaling groups facilitate application deployment across various instance types, automatically replacing instances that become unhealthy or are terminated due to Spot interruptions. To minimize interruption risks, employ the capacity-optimized Spot allocation strategy, which automatically launches Spot Instances into the most available pools based on real-time capacity data.

Now that we’ve discussed the best practices for Spot Instances, let’s explore how to adapt them for EKS.

Proposed Solution Architecture

The goals of this architecture include:

  • Automatically scaling the Kubernetes worker nodes to align with application requirements
  • Utilizing Spot Instances for cost-effective workload management on Kubernetes
  • Adapting Spot Instance best practices like diversification for EKS and Cluster Autoscaler

These objectives are achieved through the following components:

Component Role Deployment Method
Cluster Autoscaler Automatically scales EC2 instances based on active pods Open Source
EC2 Auto Scaling group Provisions and sustains EC2 instance capacity AWS
AWS Node Termination Handler Automatically drains nodes upon EC2 Spot interruptions Open Source

The architecture deploys EKS worker nodes across three AZs, utilizing three Auto Scaling groups—two dedicated to Spot Instances and one for On-Demand. The Kubernetes Cluster Autoscaler is deployed on On-Demand worker nodes, with the AWS Node Termination Handler present on all worker nodes.

Further details on Kubernetes interaction with Auto Scaling groups reveal that the Cluster Autoscaler can modify the Desired Capacity of the Auto Scaling group and terminate instances directly. Auto Scaling groups are instrumental in identifying capacity and automatically replacing instances that are unhealthy or terminated due to Spot interruptions.

The Cluster Autoscaler can be provisioned as a Deployment with one pod in the On-Demand Auto Scaling group. Each node group corresponds to a unique Auto Scaling group, but all instances within a node group must have the same vCPU count and RAM. To maximize diversification and adhere to Spot Instance best practices, you can utilize multiple node groups, each configured as a mixed-instance Auto Scaling group with a capacity-optimized Spot allocation strategy.

Autoscaling in Kubernetes Clusters

Kubernetes clusters can be scaled in two predominant ways:

  1. Horizontal Pod Autoscaler (HPA): This scales the pods within a deployment or replica set according to the application’s demands, relying on observed CPU utilization or custom metrics.
  2. Cluster Autoscaler (CA): This standalone tool adjusts the size of a Kubernetes cluster based on current needs. It increases cluster size when pods cannot be scheduled due to insufficient resources and attempts to remove underutilized nodes when possible.

When a pod cannot be scheduled, the Cluster Autoscaler determines the need for scaling out the cluster. When multiple node groups are in play, it selects one based on the Expander configuration. Various strategies are supported, including random, most-pods, least-waste, and priority.

In this instance, the random placement strategy is employed as the default Expander in the Cluster Autoscaler. This approach maximizes your ability to utilize multiple Spot capacity pools. For further details on leveraging Spot Instances effectively, you can refer to this excellent resource.

For additional insights into this topic, check out this authoritative source and for more practical applications, consider reading this blog post.

Located at: Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *