Amazon VGT2 Las Vegas: Launch of EC2 P5 Instances with NVIDIA H100 GPUs for Enhanced Generative AI and HPC Capabilities

Amazon VGT2 Las Vegas: Launch of EC2 P5 Instances with NVIDIA H100 GPUs for Enhanced Generative AI and HPC CapabilitiesMore Info

In March 2023, AWS and NVIDIA unveiled a multi-faceted collaboration aimed at creating a scalable, on-demand infrastructure for artificial intelligence (AI) specifically designed for training progressively sophisticated large language models (LLMs) and developing generative AI applications. We are pleased to announce the availability of Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, which leverage NVIDIA H100 Tensor Core GPUs and AWS’s latest advancements in networking to achieve up to 20 exaflops of computing power for the construction and training of the largest machine learning (ML) models. This release marks the culmination of over a decade of partnership between AWS and NVIDIA, delivering a series of visual computing, AI, and high-performance computing (HPC) clusters across various instance types, including the Cluster GPU (cg1) instances (2010), G2 (2013), P2 (2016), P3 (2017), G3 (2017), P3dn (2018), G4 (2019), P4 (2020), G5 (2021), and P4de instances (2022).

Notably, the size of ML models has surged to trillions of parameters. However, this increased complexity has extended the training duration for our customers, with the latest LLMs often requiring several months to train. Similarly, HPC customers are experiencing longer times to solution due to the growing fidelity of their data collection and the escalation of data sets to exabyte scales.

Introducing EC2 P5 Instances

Today, we are excited to announce the general availability of Amazon EC2 P5 instances, the next generation of GPU instances tailored to meet the high performance and scalability demands of AI/ML and HPC workloads. Powered by the cutting-edge NVIDIA H100 Tensor Core GPUs, P5 instances can reduce training times by up to 6 times (from days to hours) when compared to previous GPU-based instances. This improvement translates to training costs that are approximately 40 percent lower for customers.

Equipped with 8 NVIDIA H100 Tensor Core GPUs, 640 GB of high-bandwidth GPU memory, 3rd Gen AMD EPYC processors, 2 TB of system memory, and 30 TB of local NVMe storage, P5 instances also offer an impressive 3200 Gbps of aggregate network bandwidth, supporting GPUDirect RDMA. This feature minimizes latency and enhances performance by allowing direct communication between GPUs without the CPU.

Specifications of the P5 Instance:

Instance Size vCPUs Memory (GiB) GPUs (H100) Network Bandwidth (Gbps) EBS Bandwidth (Gbps) Local Storage (TB)
p5.48xlarge 192 2048 8 3200 80 8 x 3.84

For a quick visual representation, you can explore how P5 instances compare to previous models and processors. P5 instances are primed for training and executing inference on increasingly intricate LLMs and computer vision models, supporting demanding generative AI applications such as question answering, code generation, video and image generation, speech recognition, and more. They promise up to 6 times shorter training times across these applications. Users employing lower precision FP8 data types in their workloads will benefit even further, achieving performance gains of 6 times thanks to the NVIDIA Transformer Engine.

HPC clients utilizing P5 instances can scale their applications more effectively in areas such as pharmaceutical discovery, seismic analysis, weather forecasting, and financial modeling. Those employing dynamic programming (DP) algorithms for genome sequencing or accelerated data analytics will also reap the rewards from P5, thanks to the new DPX instruction set. This facilitates exploration of previously unreachable problem domains, accelerates solution iteration, and promotes quicker market entry.

Detailed instance specifications and comparisons

Feature p4d.24xlarge p5.48xlarge Comparison
Number & Type of Accelerators 8 x NVIDIA A100 8 x NVIDIA H100
FP8 TFLOPS per Server 16,000 6.4x vs. A100 FP16
FP16 TFLOPS per Server 2,496 8,000
GPU Memory 40 GB 80 GB 2x
GPU Memory Bandwidth 12.8 TB/s 26.8 TB/s 2x
CPU Family Intel Cascade Lake AMD Milan
vCPUs 96 192 2x
Total System Memory 1152 GB 2048 GB 2x
Networking Throughput 400 Gbps 3200 Gbps 8x
EBS Throughput 19 Gbps 80 Gbps 4x
Local Instance Storage 8 TBs NVMe 30 TBs NVMe 3.75x
GPU to GPU Interconnect 600 GB/s 900 GB/s 1.5x

The second-generation Amazon EC2 UltraClusters and Elastic Fabric Adaptor P5 instances provide unrivaled scale-out capabilities for multi-node distributed training and tightly coupled HPC workloads, delivering up to 3,200 Gbps of networking, which is 8 times faster than P4d instances. To fulfill customer demands for large-scale and low-latency solutions, P5 instances are integrated into the second-generation EC2 UltraClusters, enabling low-latency communication across more than 20,000 NVIDIA H100 Tensor Core GPUs. This establishes the largest ML infrastructure available in the cloud, offering up to 20 exaflops of aggregate compute power.

EC2 UltraClusters utilize Amazon FSx for Lustre, a fully managed shared storage solution built on a popular high-performance parallel file system. FSx for Lustre allows for rapid processing of massive datasets on demand, achieving sub-millisecond latencies. Its low-latency, high-throughput characteristics are optimized for deep learning, generative AI, and HPC workloads on EC2 UltraClusters. FSx for Lustre ensures that GPUs and ML accelerators within the clusters are efficiently supplied with data, accelerating the most demanding workloads, including LLM training, generative AI inference, and HPC tasks like genomics and financial risk modeling.

Getting Started with EC2 P5 Instances

You can begin utilizing P5 instances in the US East (N. Virginia) and US West (Oregon) Regions. When launching P5 instances, make sure to select AWS Deep Learning AMIs (DLAMIs) to support these instances. DLAMIs provide ML practitioners and researchers with the necessary infrastructure and tools to quickly build scalable, secure applications.

For further insights on this topic, check out another blog post here. If you’re looking for an authority on this subject, you can visit this resource. Also, explore excellent resources available for learning and development.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *