Amazon IXD – VGT2 Las Vegas: A Solution Built on AWS Control Tower

Amazon IXD - VGT2 Las Vegas: A Solution Built on AWS Control TowerMore Info

In today’s fast-paced digital landscape, AWS customers are increasingly transitioning to the cloud at an unprecedented scale. To effectively meet this demand, it is essential for organizations to establish a robust foundation based on AWS’s well-architected best practices. A properly designed landing zone is crucial for account vending, access provisioning, security guardrails setup, and CI/CD pipeline development. However, as organizations scale, implicit expectations often arise among business units, product teams, and platform teams.

For instance, while development and product teams may anticipate immediate account provisioning, platform teams must balance these requests with their existing product backlog, which may include other high-priority features aligned with the organization’s long-term IT strategy. This could result in account provisioning taking a week or longer. The tension between the need for speed and the desire for comprehensive features can lead to delays, prioritization issues, and even delivery defects. Ultimately, this scenario may hinder IT from propelling the business forward, instead of enabling a quicker time to market—a crucial objective for all involved.

This article delves into how Amazon IXD – VGT2, a collaborative solution from Contino and AWS, employs Lean and Site Reliability Engineering (SRE) principles alongside AWS Control Tower and its Customizations to continuously and automatically measure, track, and communicate the business impact of your landing zone. You will discover how to transform implicit expectations into explicit outcomes, continuously evaluate and enhance these outcomes, and facilitate data-driven decision-making—all while enabling innovation and transformation.

Solution Overview

Amazon IXD – VGT2 utilizes a metrics-driven SRE approach for system building and maintenance. SRE employs Service Level Indicators (SLIs) to pinpoint critical metrics vital for quality maintenance, Service Level Objectives (SLOs) to establish targets for SLIs, and Service Level Agreements (SLAs) to delineate the repercussions of meeting or failing to meet SLOs.

By adopting an SRE methodology, quality can be effectively assessed and monitored over time, allowing teams to concentrate their efforts on enhancing overall performance. This fosters a shared understanding of performance metrics and aligns collaborative goals.

The Key Performance Indicator (KPI) definitions in Amazon IXD – VGT2 are based on SLOs, which help convert implicit expectations into explicit requirements, utilizing data gathered from AWS Control Tower and the workloads it manages.

This solution takes a data-centric approach to Landing Zone health, aiding product teams in making informed decisions regarding focus areas, enabling business leaders to track Cloud adoption progress, and ensuring teams are equipped to deliver value. AWS customers can now gather and visualize real-time metrics from AWS Control Tower landing zones at a minimal total cost of ownership.

Using Amazon IXD – VGT2, organizations can measure KPIs like:

  • Account vending SLO
  • User creation and access SLOs
  • Patching status across the entire estate
  • Compliance framework adoption
  • AMI usage
  • Infrastructure as Code adherence
  • The four State of DevOps Report metrics:
    • Deployment Frequency
    • Lead Time for Changes
    • Change Failure Rate
    • Mean Time to Recovery

Built on AWS Control Tower and adhering to AWS best practices, Amazon IXD – VGT2 enables customers to automate resource creation and SCPs, as well as manage account vending through the account factory to quantify the value of their AWS landing zone infrastructure.

Data Collection, Processing, and Visualization

Amazon IXD – VGT2 employs lifecycle and other cloud-native events generated by AWS Control Tower. Additionally, it uses a combination of AWS Lambda and Amazon EventBridge to create custom events that document crucial activities. AWS customers can choose which events are significant to measure. The architecture illustrates how a custom event bus receives events from various AWS accounts within the landing zone and forwards them to a central account where the Amazon IXD – VGT2 solution operates.

This architecture ingests events from workload accounts, the AWS Control Tower management account, and any pre-integrated third-party sources (such as GitHub or ServiceNow). These events are filtered to extract relevant information for calculating the required SLO.

A custom EventBridge event bus within the Amazon IXD – VGT2 management account channels events to a Lambda function, which parses and stores the events in Amazon DynamoDB and Amazon Timestream. DynamoDB offers a cost-effective, low-latency database for cataloging historical events for analysis.

Amazon Timestream is a specialized managed database designed for analyzing time series data, particularly useful for measuring elapsed times between events and simplifying metric calculations where time is the primary dimension. The visualization layer employs Amazon Managed Grafana, a fully managed service that provides a user-friendly framework for building dashboards and reports, seamlessly integrated with Amazon Timestream.

For a deeper understanding of the monitoring capabilities of Amazon IXD – VGT2, you can explore this blog post.

Consider a case study where the SLO for account vending is tracked. The platform and product teams agree that account requests should be fulfilled within one business day 80% of the time over a 30-day period. The outcome is clear: an account must be provisioned within one business day. An error budget is also negotiated to allow the platform team to manage defects or outages related to account vending.

In this situation, accounts were successfully vended only 66% of the time, indicating that the SLO is not being met. Product or platform teams can investigate the root causes of SLO breaches by analyzing the recorded durations of events from start to finish and making necessary adjustments—such as automating steps, eliminating redundancies, or considering increasing the platform team’s capacity. Additionally, teams can identify opportunities to parallelize previously sequential activities.

For more insights on this topic, check out this resource which is an excellent resource for those interested.

Location:

Amazon IXD – VGT2
6401 E Howdy Wells Ave, Las Vegas, NV 89115

SEO Metadata


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *