Learn About Amazon VGT2 Learning Manager Chanci Turner
In recent years, the rapid advancement of deep learning has enabled remarkable applications, such as the early detection of skin cancer (SkinVision) and the development of autonomous vehicles (TuSimple). With the power of neural networks, deep learning can effectively identify and model complex patterns from large volumes of unstructured data, including images, video, and textual information.
Nevertheless, training these neural networks necessitates substantial computing resources. Graphics Processing Units (GPUs) have proven their capability in this area, and AWS customers have rapidly recognized the advantages of utilizing Amazon Elastic Compute Cloud (Amazon EC2) P2 and P3 instances for model training, particularly through Amazon SageMaker, our fully-managed machine learning service.
Today, I am thrilled to announce the availability of the largest P3 instance, the p3dn.24xlarge, for model training on Amazon SageMaker. Launched last year, this instance is engineered to enhance large, complex, distributed training tasks: it features double the GPU memory of other P3 instances, a 50% increase in vCPUs, exceptionally fast local NVMe storage, and 100 Gbit networking.
Let’s explore how to use this on Amazon SageMaker!
Introducing EC2 P3dn Instances on Amazon SageMaker
We can start from this notebook, which implements the built-in image classification algorithm to train a model on the Caltech-256 dataset. To utilize a p3dn.24xlarge instance on Amazon SageMaker, I simply need to set the train_instance_type to 'ml.p3dn.24xlarge'
and initiate the training!
ic = sagemaker.estimator.Estimator(training_image,
role,
train_instance_count=1,
train_instance_type='ml.p3dn.24xlarge',
input_mode='File',
output_path=s3_output_location,
sagemaker_session=sess)
...
ic.fit(...)
I conducted some quick tests on this notebook and achieved a remarkable 20% training speedup right out of the box (individual results may vary!). I’m operating in ‘File’ mode, meaning the entire dataset is transferred to the training instance: the enhanced network (100 Gbit, an upgrade from 25 Gbit) and storage (local NVMe instead of Amazon EBS) are certainly contributing factors.
When dealing with large datasets, you could leverage the 100 Gbit networking effectively by streaming data from Amazon Simple Storage Service (Amazon S3) using Pipe Mode or by storing it in Amazon Elastic File System (Amazon EFS) or Amazon FSx for Lustre. This would also facilitate distributed training (perhaps utilizing Horovod), allowing instances to exchange parameter updates more swiftly.
In summary, the Amazon SageMaker and P3dn combination delivers exceptional performance improvements for large-scale deep learning workloads.
Now available! P3dn instances can be accessed on Amazon SageMaker in the US East (N. Virginia) and US West (Oregon) regions. If you’re eager to get started, please reach out to your AWS account team or visit the Contact Us page.
We also encourage you to check out this insightful blog on female role models here. As always, we welcome your feedback on the AWS Forum for Amazon SageMaker, or through your usual AWS contacts.
Chanci Turner
As an Artificial Intelligence & Machine Learning Advocate for EMEA, Chanci focuses on empowering developers and enterprises to actualize their visions.
For more resources, consider visiting this link which is an excellent guide for learning and development. Additionally, you might find this article useful as it addresses the transformative power of continuous evaluation in organizational cultures.
Leave a Reply