Skip to content

Cosmos - World Foundation Models

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.

Special thanks to Johnny NΓΊΓ±ez Cano for porting the Cosmos and Transformer Engine Jetson!
See Cosmos Official page by Nvidia. See Transformer Engine by Nvidia.

What you need

  1. One of the following Jetson devices:

    Jetson Thor (XGB) Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)

  2. Running one of the following versions of JetPack :

    JetPack 6 (L4T r36.x)

  3. Sufficient storage space (preferably with NVMe SSD).

    • 12.26GB for cosmos container image
    • Space for models and datasets ( >50GB )
  4. Clone and setup jetson-containers :

    git clone https://github.com/dusty-nv/jetson-containers
    bash jetson-containers/install.sh
    

WARNING

Transformer Engine :

  • Cosmos is optimized for NVIDIA ADA GPU architecture generations and later due running in FP8.
  • Jetson AGX Orin is based on Ampere.
  • Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later.

Start Container

Use this command to automatically run, build, or pull a compatible container image for cosmos:

jetson-containers run $(autotag cosmos)

To mount your own directories into the container, use the -v or --volume flags:

jetson-containers run -v /path/on/host:/path/in/container $(autotag cosmos)

Recommendation (This download all models outside docker container):

git clone --recursive https://github.com/NVIDIA/Cosmos.git
cd Cosmos
jetson-containers run -it -v $(pwd):/workspace $(autotag cosmos)

Follow the instructions from Cosmos repository.

Here is the summarized steps to run the Cosmos models:

Generate a Hugging Face access token. Set the access token to 'Read' permission (default is 'Fine-grained').

huggingface-cli login

Download Models:

PYTHONPATH=$(pwd) python3 cosmos1/scripts/download_diffusion.py --model_sizes 7B 14B --model_types Text2World Video2World

Run Demo:

PROMPT="A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \
The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \
A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \
suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \
The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \
field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py \
    --checkpoint_dir checkpoints \
    --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
    --prompt "$PROMPT" \
    --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
    --offload_tokenizer \
    --offload_diffusion_transformer \
    --offload_text_encoder_model \
    --offload_prompt_upsampler \
    --offload_guardrail_models

It will generate a video file in the outputs directory.

Another example:

PROMPT="The video showcases a vibrant, magical garden where flowers bloom dynamically, opening and moving as though responding to a gentle rhythm in nature. \
Colorful butterflies glide gracefully through the air, and a small, clear stream winds its way through the scene, reflecting the warm glow of sunlight. \
A curious rabbit hops along a winding path, leading the viewer to a hidden alcove where a tree with golden, shimmering leaves stands, its branches moving slightly as if alive with energy. \
The entire scene radiates tranquility and wonder, inviting viewers to immerse themselves in the beauty of nature and magic combined."
PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py \
    --checkpoint_dir checkpoints \
    --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
    --prompt "$PROMPT" \
    --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
    --offload_tokenizer \
    --offload_diffusion_transformer \
    --offload_text_encoder_model \
    --offload_prompt_upsampler \
    --offload_guardrail_models