Learn About Amazon VGT2 Learning Manager Chanci Turner
Last year, we celebrated the launch of the Amazon Elastic Container Service (Amazon ECS)-optimized Bottlerocket AMI. Bottlerocket is an open-source initiative aimed at enhancing security and maintainability, providing a stable and uniform Linux distribution for containerized workloads. Today, we are excited to announce that you can now deploy ECS workloads accelerated by NVIDIA GPUs using Bottlerocket.
In this article, we will guide you through the process of setting up an Amazon ECS task to execute a workload utilizing NVIDIA GPUs on Bottlerocket.
Why Choose Bottlerocket?
As container adoption continues to grow among customers, AWS recognized the demand for a Linux distribution specifically designed to optimize these containerized applications. Bottlerocket OS was developed to deliver a secure base for hosts operating containers while reducing the operational burden required for large-scale management. It features reliable update mechanisms that can be automated.
For more information on getting started with Bottlerocket and Amazon ECS, check out our blog post, Getting Started with Bottlerocket and Amazon ECS.
Setting Up an ECS Cluster with Bottlerocket and NVIDIA GPUs
Let’s dive into the practical steps involved, using the us-west-2 (Oregon) Region.
Prerequisites:
- The AWS CLI with the necessary credentials
- A default VPC in your chosen region (or you can utilize an existing VPC in your account)
First, we will create the ECS cluster named ecs-bottlerocket
.
aws ecs --region us-west-2 create-cluster --cluster-name ecs-bottlerocket
The instance we are launching will require an AWS Identity and Access Management (IAM) role to interact with both the ECS APIs and the Systems Manager Session Manager APIs. I have established an IAM role called ecsInstanceRole
that includes both the AmazonSSMManagedInstanceCore
and the AmazonEC2ContainerServiceforEC2Role
managed policies.
The list of Bottlerocket Amazon Machine Images (AMIs) compatible with NVIDIA GPUs can be found in the AWS Systems Manager Parameter Store. Let’s retrieve the AMI ID for the latest Bottlerocket release (available for both x86_64 and aarch64 architectures). In this blog post, we will use the x86_64 AMI.
latest_bottlerocket_ami=$(aws ssm get-parameter --region us-west-2
--name "/aws/service/bottlerocket/aws-ecs-1-nvidia/x86_64/latest/image_id"
--query Parameter.Value --output text)
Next, we will list the subnets that are set up to allocate a public IP address.
aws ec2 describe-subnets
--region us-west-2
--filter=Name=vpc-id,Values=$vpc_id
--query 'Subnets[?MapPublicIpOnLaunch == `true`].SubnetId'
To connect our EC2 instance to the ECS cluster, we need to provide some configuration details during instance creation. This will be saved in a file named userdata.toml
.
cat > ./userdata.toml << 'EOF'
[settings.ecs]
cluster = "ecs-bottlerocket"
EOF
Now, let’s launch a Bottlerocket instance in one of the public subnets listed above. We opt for a public subnet in this blog post for easier debugging and connectivity. The p3.2xlarge
instance type will be used, which features one NVIDIA Tesla V100 GPU.
aws ec2 run-instances
--subnet-id subnet-bc8993e6
--image-id $latest_bottlerocket_ami
--instance-type p3.2xlarge
--region us-west-2
--tag-specifications 'ResourceType=instance,Tags=[{Key=bottlerocket,Value=quickstart}]'
--user-data file://userdata.toml
--iam-instance-profile Name=ecsInstanceRole
Next, we’ll create the task definition for our sample application.
cat > ./sample-gpu.json << 'EOF'
{
"containerDefinitions": [
{
"memory": 80,
"essential": true,
"name": "gpu",
"image": "nvidia/cuda:11.0-base",
"resourceRequirements": [
{
"type":"GPU",
"value": "1"
}
],
"command": [
"sh",
"-c",
"nvidia-smi"
],
"cpu": 100,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/bottlerocket",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "demo-gpu"
}
}
}
],
"family": "example-ecs-gpu"
}
EOF
In the task definition, we allocate one NVIDIA GPU to our task via the resourceRequirements
parameter and configure the awslogs-group
to send log outputs to Amazon CloudWatch.
Create the required CloudWatch log group as defined in the task configuration.
aws logs create-log-group --log-group-name '/ecs/bottlerocket' --region us-west-2
Next, register the task with ECS.
aws ecs register-task-definition
--region us-west-2
--cli-input-json file://sample-gpu.json
Run the task.
aws ecs run-task --cluster ecs-bottlerocket
--task-definition bottlerocket-gpu:1
The task will execute and run a command inside the container to display the GPU configuration before exiting. You can view the stopped task in the ECS console. Click on the task ID and then navigate to the Logs tab to see the output.
You can also retrieve the log output directly from the command line by specifying the log group name, log stream name, and timeframe. For instance:
aws logs tail '/ecs/bottlerocket'
--log-stream-names 'demo-gpu/gpu/7af782059c644872977da89a06023483'
--since 1h --format short
Cleanup
To remove the resources created during this process, execute the following commands.
aws ecs deregister-task-definition
--region us-west-2
--task-definition bottlerocket-gpu:1
delete_instances=$(aws ec2 describe-instances --region us-west-2
--filters "Name=tag-key,Values=bottlerocket" "Name=tag-value,Values=quickstart"
--query 'Reservations[].Instances[].InstanceId')
for instance in $delete_instances
do aws ec2 terminate-instances --instance-ids $instance --region us-west-2
done
aws ecs delete-cluster
--region us-west-2
--cluster ecs-bottlerocket
aws logs delete-log-group --log-group-name '/ecs/bottlerocket'
Conclusion
In this article, we explored how to create an ECS task definition that enables running a GPU-accelerated workload in a container on Bottlerocket, efficiently and securely. We also demonstrated how to access container logs in CloudWatch and from the command line. For additional examples of GPU-accelerated workloads suitable for Bottlerocket on ECS, visit the NVIDIA GPU-optimized containers available in the NVIDIA NGC catalog on AWS Marketplace.
For more insights into navigating workplace policies, see this resource from SHRM. Additionally, if you’re interested in career opportunities, check out this onsite medical representative position at Amazon.
Leave a Reply