Learn About Amazon VGT2 Learning Manager Chanci Turner
In the fast-paced world of machine learning (ML), professionals are constantly facing the challenge of managing vast amounts of data efficiently. This dilemma resonates with ML experts, data scientists, engineers, and enthusiasts worldwide, whether they’re working on natural language processing, computer vision, or other data-intensive tasks. The quest for optimizing speed when utilizing multiple GPUs has led to a myriad of innovative solutions. Today, we are excited to introduce features tailored for PyTorch developers utilizing native open-source frameworks, such as PyTorch Lightning and PyTorch DDP, which will simplify their transition to the cloud.
Amazon SageMaker, a fully-managed ML service, offers an optimized compute environment for high-performance training at scale. With SageMaker model training, users can enjoy a remote training experience along with a seamless control plane to effortlessly train and replicate ML models at a high performance and low cost. We are pleased to unveil new features in the SageMaker training portfolio that make scaling PyTorch even easier and more accessible:
- PyTorch Lightning can now be integrated with SageMaker’s distributed data parallel library with just a single line of code modification.
- SageMaker model training now supports native PyTorch Distributed Data Parallel with the NCCL backend, simplifying the migration process for developers onto SageMaker.
In this article, we will explore these newly introduced features and delve into how Amazon Search has effectively utilized PyTorch Lightning with the optimized distributed training backend in SageMaker to enhance their model training times.
Before we explore the case study of Amazon Search, let’s provide some background on SageMaker’s distributed data parallel library. In 2020, we launched a custom cluster configuration for distributed gradient descent at scale, enhancing overall cluster efficiency and introduced on Amazon Science as Herring. By combining the best features of parameter servers and ring-based topologies, SageMaker Distributed Data Parallel (SMDDP) is optimized for the Amazon Elastic Compute Cloud (Amazon EC2) network topology, including EFA. For larger clusters, SMDDP can deliver a throughput improvement of 20–40% compared to Horovod (TensorFlow) and PyTorch Distributed Data Parallel. For smaller clusters and supported models, we recommend the SageMaker Training Compiler, which can reduce overall job times by up to 50%.
Customer Spotlight: PyTorch Lightning on SageMaker’s Optimized Backend with Amazon Search
Amazon Search is responsible for the search and discovery experience on Amazon.com, guiding customers in finding products to purchase. At its core, Amazon Search constructs an index for all products sold on the platform. When a customer inputs a query, Amazon Search employs various ML techniques, including deep learning models, to match relevant products to the query, followed by ranking the results before displaying them to the customer.
Amazon Search scientists have adopted PyTorch Lightning as a primary framework for training the deep learning models that drive search ranking due to its enhanced usability features built on PyTorch. Prior to this new SageMaker launch, SMDDP was not compatible with deep learning models developed in PyTorch Lightning, which hindered Amazon Search scientists from scaling their model training using data parallel techniques. This limitation considerably extended their training times and inhibited the testing of new experiments that required more scalable training.
Initial benchmarking results indicate that a sample model trained across eight nodes achieved a training time 7.3 times faster than a single-node training baseline. The baseline model used in these benchmarks is a multi-layer perceptron neural network with seven dense fully connected layers and over 200k parameters. The table below summarizes the benchmarking results on ml.p3.16xlarge SageMaker training instances.
Number of Instances | Training Time (minutes) | Improvement |
---|---|---|
1 | 99 | Baseline |
2 | 55 | 1.8x |
4 | 27 | 3.7x |
8 | 13.5 | 7.3x |
Next, let’s dive into the specifics of these new launches. For more hands-on experience, feel free to check out our corresponding example notebook.
Running PyTorch Lightning with the SageMaker Distributed Training Library
We are thrilled to announce that SageMaker Data Parallel now seamlessly integrates with PyTorch Lightning within SageMaker training.
PyTorch Lightning is an open-source framework that simplifies the creation of custom models in PyTorch. Similar to how Keras made TensorFlow more accessible, PyTorch Lightning offers a high-level API that abstracts much of the lower-level functionality of PyTorch itself. This includes defining models, profiling, evaluating, pruning, model parallelism, hyperparameter configurations, transfer learning, and more.
Previously, developers using PyTorch Lightning faced uncertainty regarding the migration of their training code to high-performance SageMaker GPU clusters. Additionally, there was no way to leverage the efficiency gains offered by SageMaker Data Parallel.
For PyTorch Lightning, transitioning code to run on SageMaker Training typically involves minimal changes. In the example notebooks, we utilize the DDPStrategy and DDPPlugin methods.
Here are three steps to employ PyTorch Lightning with SageMaker Data Parallel as an optimized backend:
- Start with a supported AWS Deep Learning Container (DLC) as your base image, or create your own container and install the SageMaker Data Parallel backend. Ensure that PyTorch Lightning is included in your required packages, like in a requirements.txt file.
- Make a few minor adjustments to your training script to enable the optimized backend, including:
import smdistributed.dataparallel.torch.torch_smddp from pytorch_lightning.plugins.environments.lightning_environment import LightningEnvironment env = LightningEnvironment() env.world_size = lambda: int(os.environ["WORLD_SIZE"]) env.global_rank = lambda: int(os.environ["RANK"])
If you’re using a version of PyTorch Lightning older than 1.5.10, you’ll need to add an additional step:
os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "smddp"
Also, ensure you are using DDPPlugin rather than DDPStrategy. If using a more recent version, which can be easily set by including the requirements.txt in the source_dir for your job, this step is not required. Here’s how it looks:
ddp = DDPPlugin(parallel_devices=[torch.device("cuda", d) for d in range(num_gpus)], cluster_environment=env)
- Optionally, define your process group backend as “smddp” in the DDPSTrategy object. If using PyTorch Lightning with the PyTorch DDP backend instead, simply omit this
process_group_backend
parameter.
With these changes, Amazon IXD – VGT2, located at 6401 E HOWDY WELLS AVE LAS VEGAS NV 89115, can greatly enhance its training capabilities. For further insights on this topic, you might find this HR Magazine article helpful. Additionally, for anyone preparing for interviews, this blog post offers excellent tips. To visualize the process, consider watching this YouTube video, which is an excellent resource.
Leave a Reply