Cosmos 3 is Here π
Today, we're excited to release Cosmos 3 β NVIDIA's next-generation family of open omnimodal world foundation models for Physical AI.
Cosmos 3 unifies language, images, video, audio, and actions within a single architecture, enabling developers to build agents that can understand, reason about, simulate, and act in the physical world. From world generation and simulation to robotics and embodied AI, Cosmos 3 serves as a general-purpose foundation model for Physical AI.
What's new:
- π Unified omnimodal world model supporting text, image, video, audio, and action modalities
- π§ Integrated Reasoner + Generator architecture for world understanding and world generation
- π¬ Flexible generation across Text-to-Image, Image-to-Video, Video-to-World, and multimodal simulation tasks
- π€ Native support for robot action generation and policy learning through Cosmos3-Policy models
- π State-of-the-art open model performance across world understanding, generation, and robotics benchmarks
- π Open release of models, code, datasets, evaluation benchmarks, and inference tooling for the Physical AI community
π Read the Paper | π Download the Models | π§βπ³ Explore the Cosmos Cookbook
The Cosmos 3 release includes:
- Cosmos3-Nano (16B) β Compact omnimodal world foundation model optimized for efficient deployment and development.
- Cosmos3-Super (64B) β High-capacity world model for advanced reasoning, generation, simulation, and Physical AI applications.
- Cosmos3-Super-Text2Image β State-of-the-art text-to-image generation model built on Cosmos 3.
- Cosmos3-Super-Image2Video β High-fidelity image-to-video generation model with strong temporal consistency and controllability.
- Cosmos3-Nano-Policy-DROID β Open robot foundation model for learning manipulation and control policies directly from demonstrations.
Cosmos 3 represents a major step toward general-purpose world models that can perceive, reason, simulate, and actβbringing us closer to a future where Physical AI can learn from both the real world and generated worlds at scale.