vllm-project · akshatvishu · Jun 13, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 20, 2026
@@ -90,5 +90,6 @@ th {
 | `MiniCPMO45OmniForConditionalGeneration` | MiniCPM-o 4.5 | `openbmb/MiniCPM-o-4_5` | ✅︎ | | ✅︎ | |
 | `ErnieImagePipeline` | ERNIE-Image | `baidu/ERNIE-Image`, `baidu/ERNIE-Image-Turbo` | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
 |`HiDreamImagePipeline` | HiDream-I1-Full | `HiDream-ai/HiDream-I1-Full` | ✅︎ | ✅︎ | | |
+| `AnimaPipeline` | Anima | `circlestone-labs/Anima` | ✅︎ | ✅︎ | | |
 
 ✅︎ indicates the model is supported on that backend. Empty cells mean not listed as supported on that backend.
@@ -39,9 +39,63 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
     --diffusion-load-format diffusers
 ```
 
-Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
-There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
-which are only valid together with `--diffusion-load-format diffusers`.
+Users turn on the diffusers backend primarily through the `--diffusion-load-format diffusers` argument.
+
+### Single-File Checkpoints
+
+For single-file checkpoints (such as `.safetensors` or `.ckpt`), users can load them via the `--diffusion-load-format diffusers_single_file` argument (or simply point `--model` to a local single checkpoint file).
+
+If a Diffusers pipeline class is needed, specify it using `--model-class-name`:
+
+```bash
+vllm serve "/path/to/model.safetensors" \
+    --omni \
+    --diffusion-load-format diffusers_single_file \
+    --model-class-name SomeDiffusersPipeline
+```
+
+Using `--diffusion-load-format diffusers_single_file` explicitly bypasses standard directory-based config loading. This allows you to pass a Hugging Face Hub ID (e.g. `repo/model`) or URL as the `--model` argument to fetch single files remotely, provided the specified Diffusers pipeline supports remote loading.
+
+### Native Anima Single-File Checkpoints
+
+Anima single-file checkpoints are served through the native `AnimaPipeline`, not through `AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline` is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.
+
+Use `--model-class-name AnimaPipeline`. The native path reads the Anima transformer single-file checkpoint directly, converts original Cosmos transformer keys when needed, and loads the Cosmos transformer and text conditioner into vLLM-Omni native modules.
+
+The native path also needs the non-denoiser components (`text_encoder`, `tokenizer`, `t5_tokenizer`, `vae`, and optionally `scheduler`). These must be in Diffusers `from_pretrained()` layout. Raw Anima auxiliary files such as `qwen_3_06b_base.safetensors` and `qwen_image_vae.safetensors` are converter inputs; they are not accepted directly as `components_path`.
+
+Use the Anima converter from the Diffusers reference implementation to prepare the component directory:
+
+```bash
+python /path/to/convert_anima_to_diffusers.py \
+    --transformer_ckpt_path /path/to/anima-base-v1.0.safetensors \
+    --text_encoder_ckpt_path /path/to/qwen_3_06b_base.safetensors \
+    --vae_ckpt_path /path/to/qwen_image_vae.safetensors \
+    --qwen_tokenizer_path /path/to/qwen-tokenizer \
+    --t5_tokenizer_path /path/to/t5-tokenizer \
+    --output_path /path/to/anima-components \
+    --save_pipeline
+```
+
+Then point `--model` at the raw Anima transformer checkpoint and `components_path` at the converted directory:
+
+```bash
+vllm serve "/path/to/anima.safetensors" \
+    --omni \
+    --model-class-name AnimaPipeline \
+    --diffusers-load-kwargs '{
+      "components_path": "/path/to/anima-components"
+    }'
+```
+
+No deploy config is required for local Anima single-file checkpoint discovery
+when `--model-class-name AnimaPipeline` is provided.
+
+Native Anima currently supports baseline single-GPU execution. Cache-DiT,
+TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
+HSDP, and step execution are not supported by `AnimaPipeline` yet.
+
+There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`, which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`. Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`, but does not delegate denoising to Diffusers.
 
 After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.
 

@@ -196,6 +196,23 @@ python examples/offline_inference/text_to_image/text_to_image.py \
   --auxiliary-text-encoder meta-llama/Meta-Llama-3.1-8B-Instruct \
   --output /output.png
 ```
+### Anima Single-File Checkpoints
+
+To load Anima, point `--model` to the single-file checkpoint path, pass the native pipeline class name using `--model-class-name`, and supply the converted components directory using `--diffusers-load-kwargs`:
+
+```bash
+python examples/offline_inference/text_to_image/text_to_image.py \
+  --model /path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors \
+  --model-class-name AnimaPipeline \
+  --diffusers-load-kwargs '{"components_path": "/path/to/models/anima-components"}' \
+  --prompt "A cinematic close-up of a glass teapot on a wooden table." \
+  --seed 42 \
+  --guidance-scale 4.0 \
+  --num-inference-steps 50 \
+  --height 1024 \
+  --width 1024 \
+  --output anima_output.png
+```
 
 ### Batch Requests (Multiple Prompts)
 

@@ -48,6 +48,16 @@ def parse_profiler_config(value: str) -> dict[str, Any]:
     return config
 
 
+def parse_json_dict(value: str) -> dict[str, Any]:
+    try:
+        config = json.loads(value)
+    except json.JSONDecodeError as e:
+        raise argparse.ArgumentTypeError(f"Must be a valid JSON object: {e}") from e
+    if not isinstance(config, dict):
+        raise argparse.ArgumentTypeError("Must be a JSON object (dict)")
+    return config
+
+
 def parse_args() -> argparse.Namespace:
     parser = argparse.ArgumentParser(description="Generate an image with supported diffusion models.")
     parser.add_argument(
@@ -327,6 +337,18 @@ def parse_args() -> argparse.Namespace:
         default=None,
         help="Supplementary auxiliary text encoder parameters model name or path (especially for Hidream-l1-full).",
     )
+    parser.add_argument(
+        "--model-class-name",
+        type=str,
+        default=None,
+        help="Override the diffusion pipeline class name (e.g. AnimaPipeline).",
+    )
+    parser.add_argument(
+        "--diffusers-load-kwargs",
+        type=parse_json_dict,
+        default=None,
+        help='JSON object passed to model loader (e.g. \'{"components_path": "/path"}\').',
+    )
     current_omni_platform.pre_register_and_update(parser)
     return parser.parse_args()
 
@@ -422,9 +444,13 @@ def main():
     }
     if args.stage_configs_path:
         omni_kwargs["stage_configs_path"] = args.stage_configs_path
-    if use_nextstep:
+    if args.model_class_name:
+        omni_kwargs["model_class_name"] = args.model_class_name
+    elif use_nextstep:
         # NextStep-1.1 requires explicit pipeline class
         omni_kwargs["model_class_name"] = "NextStep11Pipeline"
+    if args.diffusers_load_kwargs is not None:
+        omni_kwargs["diffusers_load_kwargs"] = args.diffusers_load_kwargs
     omni = Omni(**omni_kwargs)
     model_class_name = get_model_class_name(omni)
     declared_extra_body_params = get_extra_body_params(model_class_name)
@@ -455,6 +481,10 @@ def main():
         print(f"  LoRA: scale={args.lora_scale}")
     if args.stage_configs_path:
         print(f"  stage-configs-path: {args.stage_configs_path}")
+    if args.model_class_name:
+        print(f"  Model class name: {args.model_class_name}")
+    if args.diffusers_load_kwargs is not None:
+        print(f"  Diffusers load kwargs: {args.diffusers_load_kwargs}")
     print(f"{'=' * 60}\n")
 
     # Build LoRA request when --lora-path is set

@@ -38,7 +38,31 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
 
 Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
 There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
-which are only valid together with `--diffusion-load-format diffusers`.
+which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`.
+Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`,
+but does not delegate denoising to Diffusers.
+
+### Native Anima Single-File Checkpoints
+
+Anima single-file checkpoints are served through the native `AnimaPipeline`, not through
+`AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline`
+is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.
+
+```bash
+vllm serve "/path/to/anima-base-v1.0.safetensors" \
+    --omni \
+    --model-class-name AnimaPipeline \
+    --diffusers-load-kwargs '{"components_path": "/path/to/anima-components"}'
+```
+
+No deploy config is required for local Anima single-file checkpoint discovery
+when `--model-class-name AnimaPipeline` is provided.
+
+The native path needs the non-denoiser components (`text_encoder`, `tokenizer`,
+`t5_tokenizer`, `vae`, and optionally `scheduler`) in Diffusers `from_pretrained()`
+layout. Native Anima currently supports baseline single-GPU execution.
+Cache-DiT, TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG
+parallel, HSDP, and step execution are not supported by `AnimaPipeline` yet.
 
 After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.
 

@@ -39,6 +39,7 @@ recipes/
 | [`Bagel/BAGEL-7B-MoT.md`](./Bagel/BAGEL-7B-MoT.md) | Text-to-image with shared online/offline examples | 1x A100 80GB / 2x CUDA GPUs |
 | [`BosonAI/Higgs-Audio-V3-TTS.md`](./BosonAI/Higgs-Audio-V3-TTS.md) | Online + offline multilingual TTS with voice cloning | 1x H100 80GB |
 | [`ByteDance/Lance.md`](./ByteDance/Lance.md) | Unified AR+diffusion: text/img/video gen + understanding (Lance 3B) | 1x B300 / A100 80GB |
+| [`circlestone-labs/Anima.md`](./circlestone-labs/Anima.md) | Native single-file text-to-image serving | 1x AMD MI300X |
 | [`fishaudio/Fish-Speech-S2-Pro.md`](./fishaudio/Fish-Speech-S2-Pro.md) | Online serving for TTS | 1x A800 80GB |
 | [`Helios/Helios.md`](./Helios/Helios.md) | Text-to-video, image-to-video, and video-to-video generation | 1x NVIDIA H20 |
 | [`inclusionAI/Ming-flash-omni-2.0.md`](./inclusionAI/Ming-flash-omni-2.0.md) | Online serving for multimodal chat + standalone TTS | 4x H100 / 1x H100 80GB |

@@ -0,0 +1,160 @@
+# Anima
+
+> Native single-file diffusion text-to-image
+
+## Summary
+
+- Vendor: circlestone-labs
+- Model: [`circlestone-labs/Anima`](https://huggingface.co/circlestone-labs/Anima)
+- Task: text2img
+- Mode: Offline inference, Online serving (OpenAI-compatible API)
+- Maintainer: Community
+
+## When to use this recipe
+
+Use this recipe to run Anima via vLLM-Omni's native `AnimaPipeline`. Anima is a
+Cosmos-style diffusion transformer text-to-image model distributed as a
+single-file transformer checkpoint, not as a normal Hugging Face model
+directory.
+
+The native path reads the Anima transformer checkpoint directly, converts
+original Cosmos transformer keys when needed, and loads the Cosmos transformer
+and text conditioner into native vLLM-Omni modules. Non-denoiser components
+such as `text_encoder`, `tokenizer`, `t5_tokenizer`, `vae`, and optionally
+`scheduler` must be supplied through a Diffusers-layout components directory.
+
+Native Anima currently supports baseline single-GPU execution. Cache-DiT,
+TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
+HSDP, and step execution are not supported by `AnimaPipeline` yet.
+
+## References
+
+- Offline example:
+  [`examples/offline_inference/text_to_image/README.md`](../../examples/offline_inference/text_to_image/README.md)
+- Online serving notes:
+  [Diffusers pipeline adapter docs](../../docs/user_guide/examples/online_serving/diffusers_pipeline_adapter.md)
+- Supported model entry:
+  [`docs/models/supported_models.md`](../../docs/models/supported_models.md)
+- HuggingFace model page:
+  [circlestone-labs/Anima](https://huggingface.co/circlestone-labs/Anima)
+- Diffusers-layout components:
+  [circlestone-labs/Anima-Base-v1.0-Diffusers](https://huggingface.co/circlestone-labs/Anima-Base-v1.0-Diffusers)
+
+## Hardware Support
+
+## ROCm
+
+### 1x AMD MI300X
+
+#### Environment
+
+- OS: Ubuntu 22.04.5 LTS (x86_64)
+- Python: 3.12.13
+- Driver / runtime: ROCm 7.2.53211, HIP runtime 7.2.53211, MIOpen 3.5.1
+- PyTorch: 2.10.0+git8514f05 built with ROCm 7.2.53211
+- GPU: 1x AMD MI300X
+- vLLM version: 0.23.0
+- vLLM-Omni version or commit: 0.1.dev2002+g704724675.rocm
+
+Recommended environment variables:
+
+```bash
+export VLLM_WORKER_MULTIPROC_METHOD=spawn
+export VLLM_ROCM_USE_AITER=0
+export PYTORCH_ROCM_ARCH=gfx942
+export PYTORCH_NVML_BASED_CUDA_CHECK=1
+```
+
+#### Command - prepare assets
+
+Download the official transformer checkpoint and the Diffusers-layout component
+directory:
+
+```bash
+mkdir -p /path/to/models/anima-official
+mkdir -p /path/to/models/anima-components
+
+hf download circlestone-labs/Anima \
+    split_files/diffusion_models/anima-base-v1.0.safetensors \
+    --local-dir /path/to/models/anima-official
+
+hf download circlestone-labs/Anima-Base-v1.0-Diffusers \
+    --local-dir /path/to/models/anima-components
+
+export ANIMA_CHECKPOINT=/path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors
+export ANIMA_COMPONENTS=/path/to/models/anima-components
+```
+
+The `ANIMA_COMPONENTS` directory must be in Diffusers `from_pretrained()`
+layout. Raw auxiliary files such as `qwen_3_06b_base.safetensors` and
+`qwen_image_vae.safetensors` are converter inputs; they are not accepted
+directly as `components_path`.
+
+#### Command - text-to-image
+
+```bash
+python examples/offline_inference/text_to_image/text_to_image.py \
+    --model "$ANIMA_CHECKPOINT" \
+    --model-class-name AnimaPipeline \
+    --diffusers-load-kwargs "{\"components_path\":\"$ANIMA_COMPONENTS\"}" \
+    --prompt "A cinematic close-up of a glass teapot on a wooden table." \
+    --seed 42 \
+    --guidance-scale 4.0 \
+    --num-inference-steps 50 \
+    --height 1024 \
+    --width 1024 \
+    --output /tmp/anima_output.png
+```
+
+Use 1024x1024, 50 denoising steps, `max_sequence_length=512`, one image per
+prompt, empty negative prompt, and CFG scale 4.0 when matching Anima's default
+Diffusers settings.
+
+#### Verification
+
+Check that `/tmp/anima_output.png` exists and contains a generated image.
+
+#### Notes
+
+- Key flags: `--model-class-name AnimaPipeline` selects the native Anima path;
+  `--diffusers-load-kwargs` supplies `components_path`.
+- No deploy config is required for local single-file checkpoint discovery when
+  `--model-class-name AnimaPipeline` is provided.
+- Start with `max-concurrency=1` for correctness and latency validation.
+- Keep requests at the same resolution when comparing runs.
+- Do not enable parallelism, cache acceleration, offload, or quantized
+  checkpoint flags for Anima until support is added to `AnimaPipeline`.
+
+## Online Serving
+
+Anima supports text-to-image generation through the OpenAI-compatible image
+generation API.
+
+### Launch
+
+```bash
+vllm serve "$ANIMA_CHECKPOINT" \
+    --omni \
+    --port 8099 \
+    --model-class-name AnimaPipeline \
+    --diffusers-load-kwargs "{\"components_path\":\"$ANIMA_COMPONENTS\"}"
+```
+
+### Send requests
+
+```bash
+curl http://localhost:8099/v1/images/generations \
+    -H "Content-Type: application/json" \
+    -d "{
+      \"model\": \"$ANIMA_CHECKPOINT\",
+      \"prompt\": \"A cinematic close-up of a glass teapot on a wooden table.\",
+      \"size\": \"1024x1024\",
+      \"num_inference_steps\": 50,
+      \"guidance_scale\": 4.0,
+      \"seed\": 42
+    }"
+```
+
+The same generation knobs used by other text-to-image recipes apply:
+`num_inference_steps`, `seed`, `height` / `width` through `size`, and optional
+negative prompting.