close
close

Apre-salomemanzo

Breaking: Beyond Headlines!

Google Cloud expands its AI infrastructure with sixth generation TPUs
aecifo

Google Cloud expands its AI infrastructure with sixth generation TPUs

Google Cloud will enhance AI cloud infrastructure with new NVIDIA TPUs and GPUs, the technology company announced October 30 at the App Day & Infrastructure Summit.

Now in preview for cloud customers, the sixth generation of the Trillium NPU powers many Google Cloud‘s most popular services, including Search and Maps.

“With these advances in AI infrastructure, Google Cloud is enabling businesses and researchers to redefine the boundaries of AI innovation,” wrote Mark Lohmeyer, vice president and general manager of the IT infrastructure and AI at Google Cloud, in a press release. press release. “We look forward to the transformative new applications of AI that will emerge from this powerful foundation.” »

Trillium NPU accelerates generative AI processes

As large language models develop, the silicon to support them must also evolve.

The sixth generation of the Trillium NPU provides training, inference and delivery of large 91 exaflop language model applications in a TPU cluster. Google Cloud reports that the sixth generation version offers a 4.7x increase in peak computing performance per chip compared to the fifth generation. It doubles the capacity of high-bandwidth memory and the bandwidth of inter-chip interconnection.

Trillium meets the high computational demands of large-scale diffusion models like Stable Diffusion XL. At its peak, Trillium infrastructure can connect tens of thousands of chips, creating what Google Cloud describes as “a building-scale supercomputer.”

Enterprise customers are demanding more cost-effective AI acceleration and increased inference performance, Mohan Pichika, product manager of the AI ​​infrastructure group at Google Cloud, said in an email to TechRepublic.

In the press releaseDeniz Tuna, Google Cloud customer and development manager at mobile app development company HubX, said: “We used Trillium TPU for text-to-image creation with MaxDiffusion and FLUX.1 and the results are incredible! We were able to generate four images in 7 seconds, which represents a 35% improvement in response latency and approximately a 45% reduction in cost/image compared to our current system!”

New virtual machines anticipate delivery of NVIDIA Blackwell chip

In November, Google will add A3 Ultra virtual machines powered by NVIDIA H200 Tensor Core GPUs to its cloud services. A3 Ultra VMs run AI or high-power computing workloads on Google Cloud. data centernetwork-wide at 3.2 Tbps GPU-to-GPU traffic. They also offer customers:

  • Integration with NVIDIA ConnectX-7 hardware.
  • 2x the GPU-to-GPU network bandwidth compared to the previous benchmark, A3 Mega.
  • Up to 2x better LLM inference performance.
  • Almost double the memory capacity.
  • 1.4x more memory bandwidth.

The new VMs will be available via Google Cloud or Google Kubernetes Engine.

SEE: Blackwell GPUs are sold out for next yearNvidia CEO Jensen Huang said at an investor meeting in October.

Additional Google Cloud Infrastructure Updates Support the Growing Enterprise LLM Sector

Naturally, Google Cloud’s infrastructure offerings interact. For example, the A3 Mega is supported by the Jupiter data center network, which will soon see its own AI workload-focused enhancement.

With its new network adapter, Titanium’s host offload capability now scales more effectively to the diverse demands of AI workloads. The Titanium ML Network Adapter uses NVIDIA ConnectX-7 hardware and Google Cloud’s data center-wide 4-lane rail-aligned network to deliver 3.2 Tbps of GPU-to-GPU traffic. The benefits of this combination extend to Jupiter, Google Cloud’s optical circuit switching network fabric.

Another key part of Google Cloud’s AI infrastructure is the processing power required for AI training and inference. Hypercompute Cluster, which contains A3 Ultra virtual machines, brings together a large number of AI accelerators. Hypercompute Cluster can be configured via an API call, leverages reference libraries such as JAX or PyTorch, and supports open AI models like Gemma2 and Llama3 for benchmarking.

Google Cloud customers can access Hypercompute Cluster with A3 Ultra VMs and Titanium ML network adapters in November.

These products meet enterprise customer demands for optimized GPU utilization and simplified access to high-performance AI infrastructure, Pichika said.

“Hypercompute Cluster provides an easy-to-use solution for businesses to harness the power of Hypercomputer AI for AI training and inference at scale,” he said via email.

Google Cloud is also preparing racks for NVIDIA’s upcoming Blackwell GB200 NVL72 GPUs, which are expected to be adopted by hyperscalers in early 2025. Once available, these GPUs will connect to Google’s Axion processor-based series of virtual machines, leveraging Google’s custom Arm processors.

Pichika declined to directly clarify whether the timing of Hypercompute Cluster or Titanium ML was related to delays in Blackwell GPU deliveries: “We are excited to continue our work together to offer customers the best of both technologies. »

Two other services, Hyperdisk ML AI/ML-focused Block Storage Service and Parallestore AI/HPC-focused Parallel File System, are now generally available.

Google Cloud services are accessible on many international regions.

Google Cloud Competitors for AI Hosting

Google Cloud primarily competes with Amazon Web Services and Microsoft Azure in cloud hosting of large language models. Alibaba, IBM, Oracle, VMware and others offer similar large language model resources, but not always at the same scale.

According to StatistGoogle Cloud held 10% of the global cloud infrastructure services market in the first quarter of 2024. Amazon AWS held 34% and Microsoft Azure 25%.