Microsoft Azure and NVIDIA Launch Groundbreaking GB300 NVL72 Supercomputing Cluster for AI

Thank you for reading this post, don't forget to subscribe!

Rongchai Wang
Oct 09, 2025 22:13

Microsoft Azure partners with NVIDIA to unveil the world’s first GB300 NVL72 supercomputing cluster, designed to enhance AI model development and solidify U.S. leadership in AI technology.

Microsoft Azure, in collaboration with NVIDIA, has introduced a pioneering supercomputing cluster, the NVIDIA GB300 NVL72, designed to meet the rigorous demands of AI model development. This innovative platform is set to bolster the United States’ standing in the AI sector, according to a recent announcement by Microsoft.

Revolutionizing AI Infrastructure

The newly launched NDv6 GB300 VM series represents the industry’s first supercomputing-scale production cluster utilizing NVIDIA GB300 NVL72 systems. This initiative is specifically tailored to support OpenAI’s advanced AI inference workloads. The cluster comprises over 4,600 NVIDIA Blackwell Ultra GPUs, interconnected through the NVIDIA Quantum-X800 InfiniBand networking platform, ensuring high inference and training throughput for complex AI models.

This development signifies a milestone in the long-standing partnership between NVIDIA and Microsoft, aimed at constructing AI infrastructure capable of handling the most demanding workloads. Nidhi Chappell, Corporate Vice President of Microsoft Azure AI Infrastructure, emphasized the significance of this achievement, highlighting the shared commitment of Microsoft and NVIDIA to optimize modern AI data centers.

The Powerhouse: NVIDIA GB300 NVL72

Central to Azure’s new offering is the liquid-cooled, rack-scale NVIDIA GB300 NVL72 system. Each rack integrates 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs, creating a robust unit for accelerating training and inference processes in large-scale AI models. This system boasts 37 terabytes of fast memory and 1.44 exaflops of FP4 Tensor Core performance per VM, essential for handling reasoning models and multimodal generative AI.

The NVIDIA Blackwell Ultra platform, supported by NVIDIA’s full-stack AI platform, excels in both training and inference. Recent MLPerf Inference v5.1 benchmarks demonstrated record-setting performance, showcasing up to five times higher throughput per GPU on substantial AI models compared to previous architectures.

Advanced Networking and Scalability

The supercomputing cluster employs a two-tiered NVIDIA networking architecture to connect over 4,600 GPUs, ensuring both scale-up and scale-out performance. Within each rack, the NVIDIA NVLink Switch fabric provides 130 TB/s of bandwidth, transforming the rack into a unified accelerator with a shared memory pool. For broader scalability, the cluster utilizes the NVIDIA Quantum-X800 InfiniBand platform, offering 800 Gb/s of bandwidth per GPU for seamless communication across the entire system.

Microsoft Azure’s cluster also incorporates NVIDIA’s advanced adaptive routing and congestion control capabilities, enhancing the efficiency of large-scale AI training and inference operations.

Envisioning the Future of AI

The deployment of the world’s first production NVIDIA GB300 NVL72 cluster marks a significant advancement in AI infrastructure. As Microsoft Azure aims to expand its deployment of NVIDIA Blackwell Ultra GPUs, further innovations are anticipated, driven by customers such as OpenAI. This development is expected to unlock new potential in AI technology, paving the way for future breakthroughs.

For more information on this announcement, please visit the official blog of NVIDIA.

Image source: Shutterstock

Source link