Skip to content

Cosmos 3 Launch

Latest

Choose a tag to compare

@mingyuliutw mingyuliutw released this 01 Jun 04:38
· 26 commits to main since this release

Cosmos 3 is Here πŸš€

Today, we're excited to release Cosmos 3 β€” NVIDIA's next-generation family of open omnimodal world foundation models for Physical AI.

Cosmos 3 unifies language, images, video, audio, and actions within a single architecture, enabling developers to build agents that can understand, reason about, simulate, and act in the physical world. From world generation and simulation to robotics and embodied AI, Cosmos 3 serves as a general-purpose foundation model for Physical AI.

What's new:

  • 🌍 Unified omnimodal world model supporting text, image, video, audio, and action modalities
  • 🧠 Integrated Reasoner + Generator architecture for world understanding and world generation
  • 🎬 Flexible generation across Text-to-Image, Image-to-Video, Video-to-World, and multimodal simulation tasks
  • πŸ€– Native support for robot action generation and policy learning through Cosmos3-Policy models
  • πŸ“ˆ State-of-the-art open model performance across world understanding, generation, and robotics benchmarks
  • πŸ”“ Open release of models, code, datasets, evaluation benchmarks, and inference tooling for the Physical AI community

πŸ“– Read the Paper | πŸ‘‰ Download the Models | πŸ§‘β€πŸ³ Explore the Cosmos Cookbook

The Cosmos 3 release includes:

  • Cosmos3-Nano (16B) – Compact omnimodal world foundation model optimized for efficient deployment and development.
  • Cosmos3-Super (64B) – High-capacity world model for advanced reasoning, generation, simulation, and Physical AI applications.
  • Cosmos3-Super-Text2Image – State-of-the-art text-to-image generation model built on Cosmos 3.
  • Cosmos3-Super-Image2Video – High-fidelity image-to-video generation model with strong temporal consistency and controllability.
  • Cosmos3-Nano-Policy-DROID – Open robot foundation model for learning manipulation and control policies directly from demonstrations.

Cosmos 3 represents a major step toward general-purpose world models that can perceive, reason, simulate, and actβ€”bringing us closer to a future where Physical AI can learn from both the real world and generated worlds at scale.