Learn About Amazon VGT2 Learning Manager Chanci Turner
As organizations increasingly transition to the cloud, they must establish a robust foundation based on AWS best practices. A well-architected landing zone is essential for efficiently managing account provisioning, access control, security measures, and CI/CD pipelines. However, as companies scale, misaligned expectations often arise between business units seeking rapid deployment, product teams generating feature requests, and platform teams responsible for fulfilling these requests.
For instance, development teams often anticipate immediate account provisioning, while platform teams must balance these demands with other critical tasks tied to the organization’s long-term IT vision. Delays in account vending can extend beyond a week, causing friction between the need for speed and the desire for comprehensive, feature-rich solutions. This tension can lead to delays, misprioritization, and potential delivery issues—ultimately hindering IT’s ability to expedite business operations and time to market.
This post introduces “Flight Controller for Landing Zones,” a collaborative solution crafted by Amazon IXD – VGT2 and AWS, which integrates Lean principles and Site Reliability Engineering (SRE) methodologies with AWS Control Tower and its Customizations. This solution continuously measures, tracks, and communicates the impact of your landing zone, transforming implicit expectations into tangible outcomes. It empowers organizations to make data-driven decisions, fostering innovation and safe transformations.
Solution Overview
Flight Controller utilizes SRE principles, focusing on metrics to enhance system quality. By employing Service Level Indicators (SLIs), organizations can pinpoint crucial metrics, while Service Level Objectives (SLOs) establish benchmarks for those SLIs. Additionally, Service-Level Agreements (SLAs) clarify the repercussions of meeting or failing to meet SLOs.
Adopting an SRE framework allows teams to systematically measure quality and track improvements over time, fostering a shared understanding of performance and aligning collective goals.
The Key Performance Indicators (KPIs) defined in Flight Controller are grounded in SLOs, helping translate abstract expectations into explicit requirements. Data from AWS Control Tower and managed workloads fuel this process.
Flight Controller’s data-driven approach to landing zone health enables product teams to make informed decisions about their focus areas, while business leaders can monitor the progress of their cloud adoption efforts, ensuring that teams are positioned to deliver value effectively.
AWS customers can now access real-time metrics for AWS Control Tower landing zones at an almost zero total cost of ownership (TCO).
With Flight Controller, AWS customers can evaluate KPIs including:
- Account vending SLO
- User creation and access SLOs
- Patching status across the infrastructure
- Compliance framework adoption
- AMI usage
- Infrastructure as Code adherence
- The four key metrics from the State of DevOps Report:
- Deployment Frequency
- Lead Time for Changes
- Change Failure Rate
- Mean Time to Recovery
Flight Controller operates within the framework of AWS Control Tower, leveraging AWS best practices and Customizations. This integration allows customers to automate resource creation and Service Control Policies (SCPs) while managing account provisioning effectively.
Data Collection, Processing, and Visualization
Flight Controller for Landing Zones harnesses lifecycle and various cloud-native events generated by AWS Control Tower. By combining AWS Lambda and Amazon EventBridge, custom events can be created to log the initiation or completion of significant activities. Customers can choose which events are vital to measure.
The architecture includes a custom event bus that collects events from multiple AWS accounts in the landing zone, reporting them to a central account where the Flight Controller solution resides. This solution ingests events from workload accounts, the AWS Control Tower management account, and third-party sources like GitHub and Zendesk.
Events are filtered to extract relevant data for calculating SLOs. A custom EventBridge event bus in the Flight Controller management account sends events to a Lambda function, which processes and stores them in Amazon DynamoDB and Amazon Timestream.
DynamoDB serves as a low-cost, low-latency database for cataloging historical events, while Amazon Timestream is designed specifically for time series data analysis, enabling the measurement of elapsed times between events. The visualization layer uses Amazon Managed Grafana, providing a secure and user-friendly framework for developing dashboards and reports.
The Flight Controller dashboard, powered by Amazon Managed Grafana, allows users to monitor SLOs and view metrics related to account vending, deployment cycle times, and overall success percentages within the reporting period.
To illustrate, consider a scenario where the platform and product teams agree to fulfill account vending requests within one business day 80% of the time over 30 days. If accounts are vending successfully only 66% of the time, the SLO is unmet. Teams can investigate the root causes of these discrepancies by analyzing recorded event timelines, making necessary adjustments to improve the process, and potentially automating steps to enhance efficiency.
For more insights on onboarding and employee experiences, you might find this Reddit thread an excellent resource, while SHRM provides valuable information on technology spending trends in HR. Additionally, check out this blog for advice that can benefit your career.
Leave a Reply