A new networking protocol developed with OpenAI and Nvidia aims to solve the biggest bottleneck in training large-scale AI models.
A new networking protocol developed with OpenAI and Nvidia aims to solve the biggest bottleneck in training large-scale AI models.

OpenAI, in partnership with tech giants Nvidia, Microsoft, AMD, Intel, and Broadcom, has introduced a new networking protocol designed to prevent costly delays in training advanced artificial intelligence models. The technology, called Multipath Reliable Connection (MRC), is already being deployed in some of the world's largest AI supercomputers to move massive datasets between GPUs more efficiently and reliably.
"Our goal was not just to build a fast network, but also to build one that delivers very predictable performance, even in the presence of failures, to keep training jobs moving," OpenAI said in a blog post announcing the initiative.
MRC is a remote direct memory access (RDMA) transport protocol that fundamentally changes how data travels in an AI factory. Instead of relying on a single network path, which can create a bottleneck or halt training if it fails, MRC stripes traffic across hundreds of different paths simultaneously. The protocol is built into the latest 800Gb/s network interfaces and is already in use in OpenAI’s largest Nvidia GB200 supercomputers and being deployed by Microsoft in its Azure data centers.
The move addresses a critical vulnerability in the economics of AI. When training a frontier model across tens of thousands of GPUs, even a millisecond-long network stall can leave millions of dollars of computing hardware idle. By providing multiple redundant paths and the intelligence to steer around congestion, MRC is designed to maximize the utilization of these expensive AI systems, directly impacting the return on investment for companies spending billions on AI infrastructure.
Training large AI models involves a constant, high-volume exchange of data between thousands of GPUs that must remain in lockstep. In traditional networking, if a link in the path gets congested or a switch fails, the entire job can pause while the system reroutes. This delay, known as a "tail latency" event, is a major source of inefficiency.
MRC tackles this problem in several ways. The protocol uses real-time signals from the network fabric to detect and steer traffic away from overloaded links. When data is lost, it can be retransmitted quickly and precisely, minimizing the impact of faults. According to Nvidia, its Spectrum-X platform, which runs MRC, can detect a path failure and reroute traffic in hardware within microseconds. This allows a "smart tenant" like OpenAI to have greater control over routing and network behavior, even when running on a cloud provider's infrastructure like Microsoft Azure.
In a significant move to foster broad adoption, the MRC specification has been made public through the Open Compute Project (OCP), an industry body that promotes open-source hardware designs. The involvement of AMD, Intel, and Broadcom alongside Nvidia and Microsoft signals a collaborative effort to build a common standard for high-performance AI networking.
However, the open specification comes with a competitive dynamic. While anyone can implement the protocol, Nvidia is betting that its hardware-specific execution on its Spectrum-X switches and SuperNICs will deliver superior performance. This "open standards, differentiated implementation" strategy has been a hallmark of Nvidia's success. Gilad Shainer, Senior Vice President at Nvidia, noted that he expects a variety of Ethernet protocols to coexist, tailored to different customer needs, rather than a single winner-take-all standard like the one proposed by the Ultra Ethernet Consortium (UEC).
For investors, this announcement reinforces the competitive positions of the companies involved. It solidifies Nvidia's role as a provider of end-to-end AI systems, not just chips. For Microsoft, it enhances the performance and resilience of its Azure cloud, a key factor in attracting and retaining large AI customers like OpenAI. The participation of AMD and Intel ensures they remain part of the conversation, preventing a complete lock-in by a single vendor and providing the industry with multiple paths forward.
This article is for informational purposes only and does not constitute investment advice.