NVIDIA and Microsoft Join Forces To Build a Scalable Generative AI Supercomputer

The partnership with NVIDIA means Microsoft is one of the first large-scale takers for H100, NVIDIA’s powerful new AI-focused GPU based on the Hopper architecture.

November 17, 2022

On Wednesday, NVIDIA and Microsoft announced an ambitious project-driven partnership to build what they claim would be “one of the most powerful AI supercomputers in the world” built on the Azure cloud service. The partnership follows a 2019 partnership between Microsoft and OpenAI that resulted in the development of the first supercomputer on Azure.

Microsoft is one-upping itself by setting out to develop a new Azure-hosted supercomputer for AI training and deep learning applications. The partnership with NVIDIA means Microsoft is one of the first large-scale takers for H100, its powerful new AI-focused GPU based on the Hopper architecture developed by the American chipmaker.

The H100 is NVIDIA’s flagship acceleration GPU targeted for server implementations, delivering higher power efficiency and speeds up to 6x faster than the previous Ampere architecture-based A100. The H100 is also used alongside Intel Xeon Ice Lake CPUs in Lenovo’s Henri system.

Both H100 and A100 GPUs will be at the core of the newly envisioned supercomputer, besides NVIDIA Quantum-2 400Gb/s InfiniBand for networking and the NVIDIA AI Enterprise software suite. The new undertaking will also leverage Microsoft’s cloud infrastructure (Azure) and virtual machines (ND- and NC-series), which will be the first public cloud instances to incorporate NVIDIA Quantum-2 400Gb/s InfiniBand.

Through the collaboration, NVIDIA hopes to make strides in unsupervised (can also be semi-supervised) algorithmic learning that allows machines to create content such as text, code, digital images, video or audio. The field is broadly termed generative AI. NVIDIA will rope Megatron Turing NLG 530B (its answer to OpenAI’s GPT-3) to this end.

On its part, Microsoft will cater to AI and deep learning workload optimization through DeepSpeed, the open-source library it developed. DeepSpeed can help in minimizing neutral network infrastructure requirements.

The partnership also ensures that Azure customers will have access to the cloud-native suite of enterprise-grade AI and data analytics tools, software and frameworks, collectively known as the NVIDIA AI Enterprise software suite. Manuvir Das, vice president of enterprise computing at NVIDIA, said, “Our collaboration with Microsoft will provide researchers and companies with state-of-the-art AI infrastructure and software to capitalize on the transformative power of AI.”

NVIDIA currently has Selene, a supercomputer it built during the COVID-19 pandemic. It is based on A100 and delivers 2.8 exaflops of AI peak performance and 63.460 petaflops on HPL. It is used for machine learning, AI data analytics, and high-performance computing (HPC). It was also used to train GauGAN2, an AI model that, like OpenAI’s GLIDE and DALL-E, synthesizes sketches and words into photorealistic representations.

See More: Microsoft, GitHub and OpenAI Accused of Software Piracy, Sued for $9B in Damages

NVIDIA Eos is being built for advanced climate science research, digital biology and the future of AI. It has 576 DGX H100 systems with 4,608 DGX H100 GPUs. Eos will deliver 18.4 exaflops of AI computing performance and 275 petaflops of regular scientific computing performance (HPL), 4x faster than Japan’s Fugaku (currently #2 on the Top500 list).

However, NVIDIA is discounting both Selene (#9 in the Top500 list) and the under-development Eos supercomputers for generative AI. Scott Guthrie, executive vice president of the Cloud + AI Group at Microsoft, explains why. “Our collaboration with NVIDIA unlocks the world’s most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure,” Guthrie noted.

Essentially, the Microsoft-NVIDIA collaboration is intended for generative AI supercomputing scalability besides sheer power.

“Customers can deploy thousands of GPUs in a single cluster to train even the most massive large language models, build the most complex recommender systems at scale, and enable generative AI at scale,” NVIDIA said.

NVIDIA is also scaling up 3D content, design, and simulation with the Omniverse Cloud suite of tools and services. With Omniverse Cloud, the tools for developing 3D content can run even on run-of-the-mill computers that do not have GeForce or RTX hardware by NVIDIA or any other high-performance capabilities. 

“AI technology advances as well as industry adoption are accelerating. The breakthrough of foundation models has triggered a tidal wave of research, fostered new startups and enabled new enterprise applications,” Das added.

Let us know if you enjoyed reading this news on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!

Image source: Shutterstock

MORE ON TECH PARTNERSHIPS

Sumeet Wadhwani
Sumeet Wadhwani

Asst. Editor, Spiceworks Ziff Davis

An earnest copywriter at heart, Sumeet is what you'd call a jack of all trades, rather techs. A self-proclaimed 'half-engineer', he dropped out of Computer Engineering to answer his creative calling pertaining to all things digital. He now writes what techies engineer. As a technology editor and writer for News and Feature articles on Spiceworks (formerly Toolbox), Sumeet covers a broad range of topics from cybersecurity, cloud, AI, emerging tech innovation, hardware, semiconductors, et al. Sumeet compounds his geopolitical interests with cartophilia and antiquarianism, not to mention the economics of current world affairs. He bleeds Blue for Chelsea and Team India! To share quotes or your inputs for stories, please get in touch on sumeet_wadhwani@swzd.com
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.