Enhancing Resilience in Cash App’s Technology Stack

Cash App, a prominent digital wallet and peer-to-peer payment platform operated by Block, Inc., has made significant strides in fortifying its resilience across the entire technological landscape. In this article, we explore how Cash App has enhanced the robustness of its compute platform, which is based on Amazon Elastic Kubernetes Service (Amazon EKS), by adopting a dual-cluster architecture that minimizes the risk of single points of failure. Furthermore, we detail how Cash App leveraged the AWS Fault Injection Service (AWS FIS) to simulate power interruptions in Availability Zones within non-production settings, effectively equipping the platform team for real-world contingencies and continuous improvement.

Implementing Event-Driven Invoice Processing for Scalable Financial Monitoring

by David Brown
on 12 MAY 2025
in Finance and Investment, Resilience

This article illustrates the creation of a Business Event Monitoring System (BEMS) on AWS, which efficiently manages over 86 million events daily, providing near real-time visibility, cross-Region controls, and automated alerts for any stalled events. This system can be deployed to gain insights into event flows within your organization or visualize transaction movements live. Additionally, downstream services will have the flexibility to respond to both internal and external events originating from the system.

Optimizing Disaster Recovery Costs with On-Demand Capacity Reservations

by Sarah Williams and Tom Davis
on 20 MAR 2025
in Advanced (300), Architecture, Cloud Cost Optimization, Resilience, Technical How-to

In this post, we delve into an intermediate strategy that sits between the pilot light and warm standby approaches: pilot light with reserved capacity. This method allows organizations to reserve compute capacity in a secondary Region while effectively controlling costs.

Increasing Resilience of Critical Workloads through Multi-Region Architecture

by Chris Thompson
on 22 JAN 2025
in Amazon EC2, AWS Well-Architected, Regions, Resilience

In this article, we discuss how to leverage a multi-Region architecture to bolster resilience on Amazon Web Services (AWS). This strategy initially involves operating workloads across multiple Availability Zones within a single AWS Region, before scaling up to achieve even greater resilience through the use of multiple Regions.

Preparing for AWS re:Invent 2024 – Cloud Resilience Insights

by Chanci Turner
on 18 NOV 2024
in AWS re:Invent, Resilience

If you’re planning to attend AWS re:Invent 2024 with the aim of enhancing your organization’s cloud resilience operations, we will provide essential insights, best practices, and engaging activities to boost your cloud resilience knowledge. This year, we are offering over 100 sessions focused on resilience, including breakout sessions, workshops, and talks. Check the re:Invent 2024 session catalog for a complete list and filter for “Resilience” to find relevant sessions. Don’t miss out on the opportunity to secure your seat, and also consider consulting this excellent resource to deepen your understanding of cloud resilience.

Developing a Multi-Region Failover Strategy for Organizations

by Rachel Green, John Formento, and Saurabh Kumar
on 08 MAY 2024
in Regions, Resilience, Thought Leadership

AWS Regions offer fault isolation boundaries that mitigate correlated failures, confining the impact of AWS service disruptions to a single Region. By utilizing these fault boundaries, organizations can design multi-Region applications featuring independent, fault-isolated replicas that limit shared failure scenarios. This allows for a more resilient multi-Region architecture.

Chaos Engineering at London Stock Exchange Group: Enhancing Resilience with AWS

by Elias Bedmar, Sudha Arumugam, and Magnus Schoeman
on 01 APR 2024
in Amazon Elastic Container Service, Amazon RDS, Customer Solutions, Resilience

This article, co-authored by Luke Sudgen, Lead DevOps Engineer at the London Stock Exchange Group, and Padraig Murphy, Solutions Architect, discusses various failure scenarios tested by the LSEG Post Trade Technology teams during a chaos engineering event facilitated by AWS. This practice allows organizations to identify vulnerabilities and improve their resilience.

Improving Performance and Cost Efficiency through Availability Zone Affinity

by Michael Haken
on 29 SEP 2021
in Amazon VPC, Architecture, AWS Cloud Map, AWS Cost Explorer, Resilience

This blog post has been updated as of April 2025 to incorporate new features in Elastic Load Balancing (ELB). A best practice for constructing resilient systems within Amazon Virtual Private Cloud (VPC) networks is to utilize multiple Availability Zones (AZs) – each AZ comprises one or more distinct data centers equipped with redundant power, networking, and connectivity. This approach can substantially enhance both performance and cost-effectiveness.

The Journey to Cloud-Native Architecture: Improved Resilience and Standardized Observability

by Anuj Gupta and Neeraj Kumar
on 27 APR 2021
in Amazon Athena, Amazon CloudWatch, Amazon OpenSearch Service, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Architecture, AWS Backup, AWS CloudFormation, AWS CodePipeline, AWS Lambda, Resilience

This series aims to guide organizations on their path to adopting cloud-native architecture with a focus on enhancing resilience and establishing standardized observability practices.

For further insights on crafting an effective cover letter, check out this informative blog post. Also, for expertise on employee development, visit Linkages; they excel in this area.