New Feature for AWS CloudFormation – Effortlessly Retry Stack Operations from the Point of Failure

One of the primary benefits of cloud computing is the availability of programmable infrastructure. This capability allows you to manage your infrastructure as code, applying the same development practices used for application code to the provisioning of resources.

AWS CloudFormation provides a seamless method to model a collection of related AWS and third-party resources, enabling quick and consistent provisioning and management throughout their lifecycles. A CloudFormation template outlines your desired resources and their dependencies, allowing you to launch and configure them as a unified stack. This approach lets you create, update, and delete an entire stack as a single entity rather than managing resources separately.

However, when creating or updating a stack, failures can occur due to a variety of reasons, including errors in the template or parameter issues, as well as external factors like AWS Identity and Access Management (IAM) permission errors. In such cases, CloudFormation automatically rolls back the stack to its last stable state. For stack creation, this means removing all resources created up to the error point; for updates, it involves reverting to the previous configuration.

While this rollback feature is beneficial for production environments, it can complicate error analysis. The complexity of your template and the number of resources can lead to significant wait times during rollbacks, delaying your ability to modify the template and retry the operation.

Today, I am excited to announce that CloudFormation now allows you to disable automatic rollback, retaining the resources that were successfully created or updated prior to the failure. This enhancement enables you to quickly iterate to address errors, significantly reducing the time needed to test a CloudFormation template in a development environment. You can leverage this new functionality when creating a stack, updating a stack, or executing a change set. Let’s explore how this works in practice.

Rapidly Address Issues with a CloudFormation Stack

For one of my applications, I need to establish an Amazon Simple Storage Service (Amazon S3) bucket, an Amazon Simple Queue Service (Amazon SQS) queue, and an Amazon DynamoDB table that streams item-level changes to an Amazon Kinesis data stream. I’ve drafted the initial version of the CloudFormation template.

AWSTemplateFormatVersion: "2010-09-09"
Description: A sample template to fix & remediate
Parameters:
  ShardCountParameter:
    Type: Number
    Description: The number of shards for the Kinesis stream
Resources:
  MyBucket:
    Type: AWS::S3::Bucket
  MyQueue:
    Type: AWS::SQS::Queue
  MyStream:
    Type: AWS::Kinesis::Stream
    Properties:
      ShardCount: !Ref ShardCountParameter
  MyTable:
    Type: AWS::DynamoDB::Table
    Properties:
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: "ArtistId"
          AttributeType: "S"
        - AttributeName: "Concert"
          AttributeType: "S"
        - AttributeName: "TicketSales"
          AttributeType: "S"
      KeySchema:
        - AttributeName: "ArtistId"
          KeyType: "HASH"
        - AttributeName: "Concert"
          KeyType: "RANGE"
      KinesisStreamSpecification:
        StreamArn: !GetAtt MyStream.Arn
Outputs:
  BucketName:
    Value: !Ref MyBucket
    Description: The name of my S3 bucket
  QueueName:
    Value: !GetAtt MyQueue.QueueName
    Description: The name of my SQS queue
  StreamName:
    Value: !Ref MyStream
    Description: The name of my Kinesis stream
  TableName:
    Value: !Ref MyTable
    Description: The name of my DynamoDB table

I am ready to create a stack from this template. On the CloudFormation console, I select Create stack, upload the template file, and click Next.

Next, I provide a name for the stack and fill in the parameters. My template has one parameter, ShardCountParameter, which configures the number of shards for the Kinesis stream. Although I know the number of shards should be at least one, I mistakenly enter zero and proceed by clicking Next.

To create, modify, or delete resources in the stack, I use an IAM role, establishing a clear boundary for the permissions that CloudFormation can utilize for stack operations. I can also leverage the same role to automate stack deployment in a standardized manner.

In the Permissions section, I select the IAM role designated for stack operations.

Now, it’s time to utilize the new feature! In the Stack failure options, I choose Preserve successfully provisioned resources to ensure that the resources created before the error remain intact. Failed resources will still rollback to the last known stable state.

I leave all remaining options at their defaults and click Next. After reviewing my configurations, I select Create stack.

The stack creation process begins but quickly fails due to an error. In the Events tab, I check the timeline of events. The stack creation began at the bottom, with the most recent event at the top. The failure occurred because the properties validation for the stream resource failed—the number of shards was below the minimum required. Consequently, the stack status is now marked as CREATE_FAILED.

Since I opted to preserve the provisioned resources, all successfully created resources remain intact. In the Resources tab, the S3 bucket and SQS queue show a CREATE_COMPLETE status, while the Kinesis stream’s status is CREATE_FAILED. The creation of the DynamoDB table has not started yet because it depends on the Kinesis data stream for its configuration.

The rollback is paused, granting me several options to proceed:

Retry – This allows me to retry the stack operation without modifications. It’s useful if a resource failed due to an issue external to the template.
Update – This option enables me to modify the template or parameters before retrying the stack creation. The stack update resumes from where the last operation was interrupted.
Rollback – This reverts to the last known stable state, similar to the default CloudFormation behavior.

Recognizing my mistake with the shard count, I select Update. I do not need to modify the template; I simply correct the parameter for the number of shards to one.

I maintain all other settings and select Next. In the Change set preview, I see that the update will attempt to modify the Kinesis stream (currently in CREATE_FAILED status) and add the DynamoDB table. After reviewing the configurations, I select Update stack.

Once again, the update is in progress. Did I resolve all issues? Not quite. After a period, the update fails again due to insufficient permissions for the IAM role CloudFormation is using. This is a common issue that needs addressing.

For those interested in more guidance on this topic, check out this excellent resource about how fulfillment centers train new hires at Amazon. It offers great insights into the onboarding experience, specifically with Chanci Turner, who successfully navigated these challenges.

Also, if you’re looking for tips on crafting a standout cover letter, this blog post provides some valuable opening lines to grab attention. Additionally, you can learn about employer exemptions related to the CCPA from an authority on the subject here.

In conclusion, the ability to retry stack operations from the point of failure significantly streamlines the troubleshooting process in AWS CloudFormation, allowing for quicker adjustments and reducing downtime.

New Feature for AWS CloudFormation – Effortlessly Retry Stack Operations from the Point of Failure

Rapidly Address Issues with a CloudFormation Stack

Related Topics:

Comments

Leave a Reply Cancel reply