AWS launches Amazon EC2 Capacity Blocks for ML workloads

Amazon Web Services Inc. (AWS) has launched Amazon Elastic Compute Cloud (EC2) Capacity Blocks for Machine Learning (ML) workloads, which is now available to the public. This new offering enables customers to reserve high-performance Amazon EC2 UltraClusters of NVIDIA GPUs for their generative AI development projects. Amplify Partners, Canva, LeonardoAi, and OctoML are some of the customers who are excited to use Amazon EC2 Capacity Blocks for ML.

AWS and NVIDIA have been collaborating for over 12 years to provide scalable, high-performance GPU solutions. This partnership has enabled customers to develop remarkable generative AI applications that are revolutionizing various industries. David Brown, Vice President of Compute and Networking at AWS, stated that "AWS has unparalleled expertise in providing NVIDIA GPU-based computing in the cloud, and we also offer our own Trainium and Inferentia chips." With the introduction of Amazon EC2 Capacity Blocks, businesses and startups can now predictably acquire NVIDIA GPU capacity to build, train, and deploy their generative AI applications, without having to make any long-term capital commitments. This is one of the ways AWS is innovating to expand access to generative AI capabilities.

The new consumption model is the first of its kind in the industry, which allows customers to access highly demanded GPU compute capacity to run their short-duration ML workloads. With EC2 Capacity Blocks, customers can reserve hundreds of NVIDIA GPUs colocated in Amazon EC2 UltraClusters specially designed to handle high-performance ML workloads.

Previously, traditional ML workloads required substantial supercomputing capacity. With the advent of generative AI, even higher computing capacity is now required to process the vast datasets necessary to train foundation models (FMs) and large language models (LLMs). Clusters of GPUs, with their combined parallel processing capabilities, offer the required acceleration in the training and inference processes. However, with more organizations recognizing the transformative power of generative AI, demand for GPUs has outpaced supply.

Customers who want to leverage the latest ML technologies, especially those whose capacity needs fluctuate depending on where they are in the adoption phase, may face challenges accessing clusters of GPUs necessary to run their ML workloads. Alternatively, customers may commit to purchasing large amounts of GPU capacity for long durations only to have it sit idle when they are not actively using it. The EC2 Capacity Blocks will help ensure customers have reliable, predictable, and uninterrupted access to the GPU compute capacity required for their critical ML projects.

With EC2 Capacity Blocks, customers can reserve the amount of GPU capacity they need for short durations to run their ML workloads. This eliminates the need to hold onto GPU capacity when not in use. EC2 Capacity Blocks are deployed in EC2 UltraClusters interconnected with second-generation Elastic Fabric Adapter (EFA) petabit-scale networking. This delivers low-latency, high-throughput connectivity and enables customers to scale up to hundreds of GPUs. Clients can reserve EC2 UltraClusters of P5 instances powered by NVIDIA GPUs for a duration between one to 14 days at a future start date up to eight weeks in advance.

Once an EC2 Capacity Block is scheduled, customers can plan for their ML workload deployments with certainty, knowing they will have the GPU capacity when they need it. Customers only pay for the time they reserve, and EC2 Capacity Blocks are available in the AWS US East Ohio Region, with availability planned for additional AWS Regions and Local Zones.

With the new EC2 Capacity Blocks for ML, AI companies worldwide can rent not just one server at a time but at a dedicated scale uniquely available on AWS. This enables them to quickly and cost-efficiently train large language models and run inference in the cloud exactly when they need it.

Overall, the EC2 Capacity Blocks innovation provides predictability and timely access to GPU compute capacity at an affordable cost. This breakthrough innovation will undoubtedly accelerate the adoption of generative AI for businesses that may face challenges accessing GPU-intensive supercomputing solutions.