AWS and NVIDIA Collaborate on Next-Generation Infrastructure for Training Large Machine Learning Models and Building Generative AI Applications

New Amazon EC2 P5 instances deployed in EC2 UltraClusters are fully optimized to harness NVIDIA Hopper GPUs for accelerating generative AI training and inference at massive scale

 

Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), and NVIDIA (NASDAQ: NVDA) today announced a multi-faceted collaboration focused on building out the world's most scalable, on-demand artificial intelligence (AI) infrastructure optimized for training increasingly complex large language models (LLMs) and developing generative AI applications.

 

The collaboration will leverage next-generation Amazon Elastic Compute Cloud (Amazon EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs and AWS’s state-of-the-art networking and scalability that will deliver up to 20 exaFLOPS of compute performance for building and training the largest deep learning models. P5 instances will be the first GPU-based instance to take advantage of AWS’s second-generation Elastic Fabric Adapter (EFA) networking, which provides 3,200 Gbps of low-latency, high bandwidth networking throughput, enabling customers to scale up to 20,000 H100 GPUs in EC2 UltraClusters for on-demand access to supercomputer-class performance for AI. Among globally known companies that leverage this technology is Pinterest, which is accelerating its product development and introducing new Empathetic AI-based experiences to their customers.

 

 

“AWS and NVIDIA have collaborated for more than 12 years to deliver large-scale, cost-effective GPU-based solutions on demand for various applications such as AI/ML, graphics, gaming, and HPC,” said Adam Selipsky, CEO at AWS. “AWS has unmatched experience delivering GPU-based instances that have pushed the scalability envelope with each successive generation, with many customers scaling machine learning training workloads to more than 10,000 GPUs today. With second-generation EFA, customers will be able to scale their P5 instances to over 20,000 NVIDIA H100 GPUs, bringing supercomputer capabilities on demand to customers ranging from startups to large enterprises.”

 

“The arrival of accelerated computing and AI is timely. As businesses strive to do more with less, accelerated computing provides step-function speedups while decreasing cost and power. The advent of generative AI has prompted businesses to reimagine their products and business models and become disruptors, rather than disrupted,” said Jensen Huang, founder and CEO of NVIDIA. “AWS is a long-time partner and was the first cloud service provider to offer NVIDIA GPUs. We are thrilled to combine our expertise, scale, and reach to help customers harness accelerated computing and generative AI to engage the enormous opportunities ahead.”

 

New Supercomputing Clusters and Server Designs

 

The new P5 instances powered by NVIDIA GPUs are ideal for training increasingly complex LLMs and computer vision models that power the most demanding and compute-intensive generative AI applications, such as question answering, code generation, video and image generation, and speech recognition. And the new server designs targeted scalable, efficient AI by leveraging the thermal, electrical, and mechanical expertise of the NVIDIA and AWS engineering teams, building servers that harness GPUs to deliver AI at scale, with an emphasis on energy efficiency in AWS infrastructure. GPUs are typically 20 times more energy efficient than CPUs for specific AI workloads, with the H100 being up to 300 times more energy efficient than CPUs for LLMs.