Key Takeaways:
- Nvidia launched Cosmos 3, the first fully open physical AI omnimodel
- The model uses a mixture-of-transformers architecture for vision, language and action
- Physical AI could unlock a $24 trillion market by 2040, ARK Invest estimates
Key Takeaways:

Nvidia's Cosmos 3 marks the chipmaker's entry into foundational AI models for robotics, combining vision reasoning with multimodal generation.
Nvidia's Cosmos 3, the first fully open omnimodel for physical AI, pushes the company beyond GPU hardware into foundational model territory with a mixture-of-transformers architecture for world simulation and robotics.
"Cosmos 3 is a leaderboard-topping open physical AI foundation model built on a breakthrough mixture-of-transformers architecture that unifies vision, language and action," the company said in its June 1 announcement.
The model supports native vision reasoning and generates text, image, video, ambient sound and action outputs for synthetic data creation and physical AI policy development. Nvidia also released Alpamayo 2 Super, a 32-billion-parameter open reasoning vision-language-action model, alongside a suite of open-source physical AI agent skills spanning its Omniverse, Cosmos and Metropolis platforms.
The expansion into foundational models positions Nvidia to capture value beyond its data center GPU business, which generated $62 billion in revenue in fiscal 2025. Physical AI — encompassing autonomous vehicles, warehouse robotics and industrial automation — represents a new addressable market that could justify the company's 35x forward earnings multiple if Cosmos becomes the standard platform for robotics development.
The mixture-of-transformers architecture underpinning Cosmos 3 represents a technical departure from Nvidia's previous AI models. Unlike large language models that process text sequentially, Cosmos 3 processes vision, language and action data simultaneously, enabling it to simulate physical world interactions — a capability required for training robots and autonomous systems without real-world trial and error.
The open-source release strategy mirrors Meta's approach with its Llama family of language models, positioning Cosmos 3 as a potential standard for robotics research and development. By making the model freely available, Nvidia aims to build a network of developers and companies that rely on its hardware for training and inference, creating a software moat around its GPU business.
The competitive stakes extend beyond Nvidia's immediate chip rivals. Tesla is developing its own AI models for autonomous driving and humanoid robotics, while Google DeepMind has invested heavily in physical world simulation through its MuJoCo and Gemini platforms. Amazon, through its robotics division, represents another potential customer and competitor in warehouse automation.
For investors, the question is whether Cosmos 3 can translate network adoption into GPU demand. Each physical AI training run requires thousands of Nvidia GPUs — a single robotics model training session can consume 10,000 to 25,000 H100-equivalent GPUs over weeks. If Cosmos 3 becomes the default platform for physical AI development, it could drive a new cycle of data center capital expenditure beyond the current large language model buildout.
Nvidia shares have gained 140% over the past 12 months, driven by AI infrastructure spending from Microsoft, Amazon and Google. The Cosmos 3 launch extends the narrative beyond data center GPUs into robotics and physical AI, a market that ARK Invest estimates could reach $24 trillion in global revenue by 2040.
This article is for informational purposes only and does not constitute investment advice.