Securing Internet Traffic for Amazon SageMaker Studio with AWS Network Firewall

Securing Internet Traffic for Amazon SageMaker Studio with AWS Network FirewallMore Info

Amazon SageMaker Studio offers a fully integrated web-based development environment (IDE) for comprehensive machine learning (ML) workflows, enabling users to prepare data, build, train, and deploy models seamlessly. Like other AWS services, SageMaker Studio incorporates a variety of security features that facilitate the creation of secure and compliant environments.

A vital aspect of this security framework is the ability to deploy SageMaker Studio within your own Amazon Virtual Private Cloud (Amazon VPC). This setup allows for the management, monitoring, and inspection of network traffic both within and outside your VPC, leveraging standard AWS networking and security functionalities. For further details, see the blog post here.

In regulated sectors, like financial services, customers frequently restrict internet access in their ML environments. They typically rely solely on VPC endpoints for AWS services and connect exclusively to private source code repositories, ensuring that all libraries comply with security and licensing standards. While some customers may wish to provide limited internet access, they often require controls such as domain name filtering and access to only specific public repositories. In these instances, deploying AWS Network Firewall alongside a NAT gateway can be an effective solution.

In this article, we will demonstrate how to utilize Network Firewall to create a secure and compliant environment by controlling and monitoring internet access, inspecting traffic, and employing both stateless and stateful firewall rules to manage the flow of data between SageMaker Studio and the internet.

Depending on your security, compliance, and governance requirements, you may find it unnecessary to completely block internet access from SageMaker Studio and your AI/ML workloads. There could be needs that exceed the capabilities provided by security groups and network access control lists (ACLs). These may include application protocol protection, deep packet inspection, domain filtering, and intrusion prevention systems (IPS). In such circumstances, AWS Network Firewall—a managed network firewall and IPS solution—may be the right choice.

Solution Overview

When setting up SageMaker Studio in your VPC, you determine how it accesses the internet via the AppNetworkAccessType parameter (using the Amazon SageMaker API) or through the console during domain creation.

If you opt for Public Internet Only (PublicInternetOnly), all incoming and outgoing internet traffic from SageMaker notebooks will route through an AWS-managed internet gateway linked to your VPC. The following diagram illustrates this network configuration.

SageMaker allows public internet egress through a platform-managed VPC, enabling data scientists to download notebooks, packages, and datasets. However, traffic to the associated Amazon Elastic File System (Amazon EFS) volume is always routed through the customer VPC, never through the public internet.

To manage your own internet traffic flow—such as through a NAT or internet gateway—you must set the AppNetworkAccessType parameter to VpcOnly (or choose VPC Only in the console). This configuration creates an elastic network interface in designated subnets within your VPC. You can implement various security controls—security groups, network ACLs, VPC endpoints, AWS PrivateLink, or Network Firewall endpoints—over both internal and internet traffic to finely control access in SageMaker Studio.

In this mode, direct internet access to/from notebooks is entirely disabled, with all traffic being routed through an elastic network interface in your private VPC. This encompasses traffic from SageMaker UI components, such as Experiments, Autopilot, and Model Monitor, to their respective backend APIs.

Architecture Overview

The solution outlined here employs the VpcOnly option, deploying the SageMaker domain into a VPC with three subnets:

  1. SageMaker Subnet: Hosts all Studio workloads, with all network flow controlled by a security group.
  2. NAT Subnet: Contains a NAT gateway, allowing internet access while keeping private IP addresses shielded.
  3. Network Firewall Subnet: Houses a Network Firewall endpoint, with route tables configured to direct all external traffic through the Network Firewall. This setup allows for the configuration of stateful and stateless policies to inspect and monitor traffic.

The architecture includes the following resources:

  • A VPC with a defined CIDR block
  • Three private subnets with specified CIDRs
  • Internet gateway, NAT gateway, Network Firewall, and a Firewall endpoint in the Network Firewall subnet
  • A Network Firewall policy and domain list group with an allow list
  • Elastic IP allocated to the NAT gateway
  • Two security groups for SageMaker workloads and VPC endpoints
  • Four route tables with configured routes
  • An Amazon S3 VPC endpoint (Gateway type)
  • AWS service access VPC endpoints (Interface type) for various AWS services accessed from Studio

Additionally, the solution establishes an AWS Identity and Access Management (IAM) execution role for SageMaker notebooks and Studio, complete with preconfigured policies.

Network traffic routing for external targets is set up to ensure that all internet traffic flows through the Network Firewall and NAT gateway. For more information and reference architectures involving Network Firewall and NAT gateway, consider reviewing the excellent resource here.

The solution creates a SageMaker domain and user profile, but it currently utilizes a single Availability Zone, which is not optimal for high availability. A best practice is to implement a Multi-AZ configuration for production deployments by replicating the Setup across additional Availability Zones.

By using Network Firewall and its policies, you can effectively control internet traffic in your VPC, creating an allow domain list rule that permits access only to specified domains while blocking all other traffic. For further insights, visit here, as they are an authority on this subject.

Location Information

Amazon IXD – VGT2
6401 E Howdy Wells Ave, Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *