Learn About Amazon VGT2 Learning Manager Chanci Turner
This blog post is co-authored by Sarah Lee from Comcast Corporation. It discusses how Comcast accelerated its product launch timelines, enhanced system resilience, and minimized operational expenses by implementing Amazon Web Services (AWS) Transit Gateway and AWS Direct Connect.
Comcast is a global leader in media and technology, significantly impacting hundreds of millions of customers worldwide through its various brands, including Xfinity, Comcast Business, and Sky. The company provides exceptional broadband, mobile, and entertainment services that exceed customer expectations, while also creating and distributing top-tier entertainment, news, and sports content. Comcast’s Universal Destinations & Experiences bring incredible theme parks and attractions to life.
The teams within Comcast manage a vast cloud environment comprising numerous AWS accounts. These accounts support thousands of developers who run diverse workloads across multiple business lines such as Xfinity X1, Xfinity xFi, Xfinity Home, and Comcast Business. To explore how Comcast teams are utilizing AWS, check out previous AWS posts and videos focused on building scalable home security solutions, conducting large-scale telemetry data analytics, and monitoring home security devices via Amazon CloudWatch.
DX Model 1.0
In the early phases of Comcast’s AWS adoption, a network connectivity model called DX Model 1.0 was established. This utilized Direct Connect to provide connectivity between Amazon Virtual Private Clouds (VPCs) and Comcast’s corporate datacenters. Each Amazon VPC was linked to Comcast datacenters over a private virtual interface terminating directly on an AWS Virtual Private Gateway (and later, Direct Connect Gateway). Unfortunately, this necessitated that traffic between VPCs route through on-premises routers, leading to increased latency (see Figure 1). The same network pathway was utilized for traffic between Amazon VPCs and on-premises Comcast resources.
The connectivity model was subsequently expanded to incorporate multiple AWS Regions and on-premises datacenters to accommodate evolving workload requirements. For simplicity, Figure 2 illustrates an example involving just two AWS Regions and two on-premises datacenters. Although some VPCs had direct VPC peering connections, the majority of traffic between VPCs was routed through on-premises routers, which resulted in unnecessary hairpinning. This routing configuration increased latency, bringing VPC-to-VPC delays in line with on-premises dependencies.
This design approach was effective during the early stages of adoption, leveraging the Comcast team’s ability to swiftly deploy new virtual interfaces (VLANs that transport Direct Connect traffic) and establish BGP connections for route exchange. However, as the number of VPCs and accounts expanded, managing numerous Direct Connect connections, virtual interfaces, and associated AWS service limits on private VIFs and Routes became increasingly complex. The hairpinning of cross-VPC traffic became more pronounced as new workloads migrated to AWS, leading to increased pressure on Direct Connect load and long-term capacity planning.
DX Model 2.0
In 2021, Comcast initiated a redesign of the DX connectivity model with the aim of enhancing scalability, decreasing latency, and improving time to market. One potential solution was to increase VPC peering usage. However, the complexities associated with managing mesh-VPC peering and limitations on VPC route tables made this impractical. Consequently, the team opted to implement Transit Gateway and Direct Connect Gateway.
Transit Gateway is an AWS service that simplifies network architectures by connecting Amazon VPCs and on-premises networks through a central hub. It functions as a highly available and scalable router, facilitating Regional and cross-Region connectivity. This service supports network segmentation through multiple route tables and integrates with AWS services like Direct Connect and VPN for secure on-premises connectivity. Transit Gateway enables centralized monitoring and logging for network traffic, streamlining the management of complex network topologies, and provides efficient and secure connectivity between VPCs and on-premises resources in a hub-and-spoke architecture.
In the new DX Model 2.0, a Transit Gateway was established in each AWS Region, and these Transit Gateways were interconnected, forming a complete mesh. This design allowed Comcast to maintain VPC-to-VPC traffic within the AWS network, irrespective of whether the flow was intra- or inter-Region, effectively offloading traffic from Direct Connect connections and reducing latency.
Transit Gateways connect to on-premises systems via Direct Connect Gateway, utilizing a few transit virtual interfaces to establish BGP sessions and route exchanges. While Direct Connect Gateway is a global construct, it was utilized regionally in this design. This choice granted Comcast more control over traffic routing from specific AWS Regions to on-premises locations, resulting in streamlined routing. For instance, attaching an AWS BGP community tag to influence a route advertised from on-premises would impact only one AWS Region at a time.
Comcast achieved a single-Region SLA of 99.99% by provisioning multiple Direct Connect connections across various locations, each providing connectivity to multiple AWS Regions. Figure 3 presents a high-level architecture of DX Model 2.0 across two Regions.
Migration Approach
Comcast organically expanded to hundreds of VPCs across numerous AWS accounts as part of a multi-account strategy. Initially, onboarding new VPCs was infrequent and highly manual. As AWS usage grew, automation was developed to create new VPCs with Direct Connect connectivity; however, pre-automated VPCs were retained to minimize disruption to existing applications. Ultimately, the goal was to make pre- and post-automation VPCs as similar as possible to simplify future feature deployments and allow all users to benefit from our automation tools for VPC modifications and firewall rule management. The aim was to migrate teams with minimal disruption to their existing workloads and processes.
Upon finalizing the Transit Gateway design, a template VPC configuration was created for all new VPCs. Once this configuration was approved, tooling was developed to evaluate existing VPCs and identify configuration differences that needed normalization during migration. Additional automation was created to implement or rollback these changes on a per-VPC basis.
In alignment with AWS best practices, Comcast decided to create dedicated subnets in each Availability Zone (AZ) for the new Transit Gateway attachments. Different IPv4 CIDR blocks were allocated than those assigned for workloads, streamlining changes and rollback processes.
For more information on building resilience in the workplace, you can refer to this insightful article on resilience. Additionally, for an authoritative take on employee benefits that are crucial in today’s workplace, visit this SHRM survey summary. Lastly, if you’re interested in more resources about job onboarding at Amazon, this link offers excellent guidance.
Leave a Reply