vllm-project · akshatvishu · Jun 13, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 20, 2026
diff --git a/benchmarks/diffusion/README.md b/benchmarks/diffusion/README.md
@@ -149,3 +149,74 @@ batch may still pay compile or CUDA-graph capture cost.
 
 For a Qwen-Image continuous-batching replay example, see
 [`performance_dashboard/qwen_image_serving_performance.md`](./performance_dashboard/qwen_image_serving_performance.md).
+
+## 4. Anima Native Single-File Benchmarking
+
+Native Anima is benchmarked as a text-to-image model through the same serving
+benchmark entrypoint. Unlike standard HuggingFace model IDs, Anima serves the
+raw single-file transformer checkpoint and loads non-denoiser components from a
+Diffusers-layout component directory.
+
+Download the official Anima checkpoint and components first. The commands below
+use `/path/to/models` as a placeholder; replace it with any local directory that
+has enough space for the checkpoint and component files.
+
+```bash
+mkdir -p /path/to/models/anima-official
+mkdir -p /path/to/models/anima-components
+
+hf download circlestone-labs/Anima \
+    split_files/diffusion_models/anima-base-v1.0.safetensors \
+    --local-dir /path/to/models/anima-official
+
+hf download circlestone-labs/Anima-Base-v1.0-Diffusers \
+    --local-dir /path/to/models/anima-components
+
+CHECKPOINT=/path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors
+COMPONENTS=/path/to/models/anima-components
+```
+
+Run these commands from the vLLM-Omni repository in the Python environment or
+container where vLLM-Omni is installed.
+
+Start the server with the checkpoint as `--model` and pass the component
+directory through `--diffusers-load-kwargs`:
+
+```bash
+vllm serve "$CHECKPOINT" \
+    --omni \
+    --port 8099 \
+    --model-class-name AnimaPipeline \
+    --diffusers-load-kwargs "{\"components_path\":\"$COMPONENTS\"}"
+```
+
+Then run the standard diffusion serving benchmark:
+
+```bash
+python3 benchmarks/diffusion/diffusion_benchmark_serving.py \
+    --base-url http://localhost:8099 \
+    --endpoint /v1/chat/completions \
+    --model "$CHECKPOINT" \
+    --task t2i \
+    --dataset random \
+    --num-prompts 10 \
+    --max-concurrency 1 \
+    --warmup-requests 1 \
+    --warmup-concurrency 1 \
+    --width 1024 \
+    --height 1024 \
+    --num-inference-steps 50
+```
+
+This matches the Diffusers baseline defaults for Anima: 1024x1024, 50 denoising
+steps, `max_sequence_length=512`, one image per prompt, empty negative prompt,
+and CFG scale 4.0 from the default guider. Do not pass `guidance_scale` through
+the benchmark unless you are intentionally measuring a non-default CFG setting.
+
+Native Anima currently supports baseline single-GPU execution. Cache-DiT,
+TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
+HSDP, and step execution are not supported by `AnimaPipeline` yet.
+
+Anima uses the default single diffusion stage for local single-file checkpoint
+discovery when `--model-class-name AnimaPipeline` is provided; no deploy config
+is required.
@@ -89,5 +89,6 @@ th {
 | `MiniCPMO45OmniForConditionalGeneration` | MiniCPM-o 4.5 | `openbmb/MiniCPM-o-4_5` | ✅︎ | | ✅︎ | |
 | `ErnieImagePipeline` | ERNIE-Image | `baidu/ERNIE-Image`, `baidu/ERNIE-Image-Turbo` | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
 |`HiDreamImagePipeline` | HiDream-I1-Full | `HiDream-ai/HiDream-I1-Full` | ✅︎ | ✅︎ | | |
+| `AnimaPipeline` | Anima | `circlestone-labs/Anima` | ✅︎ | ✅︎ | | |
 
 ✅︎ indicates the model is supported on that backend. Empty cells mean not listed as supported on that backend.
@@ -39,9 +39,63 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
     --diffusion-load-format diffusers
 ```
 
-Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
-There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
-which are only valid together with `--diffusion-load-format diffusers`.
+Users turn on the diffusers backend primarily through the `--diffusion-load-format diffusers` argument.
+
+### Single-File Checkpoints
+
+For single-file checkpoints (such as `.safetensors` or `.ckpt`), users can load them via the `--diffusion-load-format diffusers_single_file` argument (or simply point `--model` to a local single checkpoint file).
+
+If a Diffusers pipeline class is needed, specify it using `--model-class-name`:
+
+```bash
+vllm serve "/path/to/model.safetensors" \
+    --omni \
+    --diffusion-load-format diffusers_single_file \
+    --model-class-name SomeDiffusersPipeline
+```
+
+Using `--diffusion-load-format diffusers_single_file` explicitly bypasses standard directory-based config loading. This allows you to pass a Hugging Face Hub ID (e.g. `repo/model`) or URL as the `--model` argument to fetch single files remotely, provided the specified Diffusers pipeline supports remote loading.
+
+### Native Anima Single-File Checkpoints
+
+Anima single-file checkpoints are served through the native `AnimaPipeline`, not through `AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline` is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.
+
+Use `--model-class-name AnimaPipeline`. The native path reads the Anima transformer single-file checkpoint directly, converts original Cosmos transformer keys when needed, and loads the Cosmos transformer and text conditioner into vLLM-Omni native modules.
+
+The native path also needs the non-denoiser components (`text_encoder`, `tokenizer`, `t5_tokenizer`, `vae`, and optionally `scheduler`). These must be in Diffusers `from_pretrained()` layout. Raw Anima auxiliary files such as `qwen_3_06b_base.safetensors` and `qwen_image_vae.safetensors` are converter inputs; they are not accepted directly as `components_path`.
+
+Use the Anima converter from the Diffusers reference implementation to prepare the component directory:
+
+```bash
+python /path/to/convert_anima_to_diffusers.py \
+    --transformer_ckpt_path /path/to/anima-base-v1.0.safetensors \
+    --text_encoder_ckpt_path /path/to/qwen_3_06b_base.safetensors \
+    --vae_ckpt_path /path/to/qwen_image_vae.safetensors \
+    --qwen_tokenizer_path /path/to/qwen-tokenizer \
+    --t5_tokenizer_path /path/to/t5-tokenizer \
+    --output_path /path/to/anima-components \
+    --save_pipeline
+```
+
+Then point `--model` at the raw Anima transformer checkpoint and `components_path` at the converted directory:
+
+```bash
+vllm serve "/path/to/anima.safetensors" \
+    --omni \
+    --model-class-name AnimaPipeline \
+    --diffusers-load-kwargs '{
+      "components_path": "/path/to/anima-components"
+    }'
+```
+
+No deploy config is required for local Anima single-file checkpoint discovery
+when `--model-class-name AnimaPipeline` is provided.
+
+Native Anima currently supports baseline single-GPU execution. Cache-DiT,
+TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
+HSDP, and step execution are not supported by `AnimaPipeline` yet.
+
+There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`, which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`. Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`, but does not delegate denoising to Diffusers.
 
 After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.
 

@@ -196,6 +196,23 @@ python examples/offline_inference/text_to_image/text_to_image.py \
   --auxiliary-text-encoder meta-llama/Meta-Llama-3.1-8B-Instruct \
   --output /output.png
 ```
+### Anima Single-File Checkpoints
+
+To load Anima, point `--model` to the single-file checkpoint path, pass the native pipeline class name using `--model-class-name`, and supply the converted components directory using `--diffusers-load-kwargs`:
+
+```bash
+python examples/offline_inference/text_to_image/text_to_image.py \
+  --model /path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors \
+  --model-class-name AnimaPipeline \
+  --diffusers-load-kwargs '{"components_path": "/path/to/models/anima-components"}' \
+  --prompt "A cinematic close-up of a glass teapot on a wooden table." \
+  --seed 42 \
+  --guidance-scale 4.0 \
+  --num-inference-steps 50 \
+  --height 1024 \
+  --width 1024 \
+  --output anima_output.png
+```
 
 ### Batch Requests (Multiple Prompts)
 

@@ -48,6 +48,16 @@ def parse_profiler_config(value: str) -> dict[str, Any]:
     return config
 
 
+def parse_json_dict(value: str) -> dict[str, Any]:
+    try:
+        config = json.loads(value)
+    except json.JSONDecodeError as e:
+        raise argparse.ArgumentTypeError(f"Must be a valid JSON object: {e}") from e
+    if not isinstance(config, dict):
+        raise argparse.ArgumentTypeError("Must be a JSON object (dict)")
+    return config
+
+
 def parse_args() -> argparse.Namespace:
     parser = argparse.ArgumentParser(description="Generate an image with supported diffusion models.")
     parser.add_argument(
@@ -327,6 +337,18 @@ def parse_args() -> argparse.Namespace:
         default=None,
         help="Supplementary auxiliary text encoder parameters model name or path (especially for Hidream-l1-full).",
     )
+    parser.add_argument(
+        "--model-class-name",
+        type=str,
+        default=None,
+        help="Override the diffusion pipeline class name (e.g. AnimaPipeline).",
+    )
+    parser.add_argument(
+        "--diffusers-load-kwargs",
+        type=parse_json_dict,
+        default=None,
+        help='JSON object passed to model loader (e.g. \'{"components_path": "/path"}\').',
+    )
     current_omni_platform.pre_register_and_update(parser)
     return parser.parse_args()
 
@@ -422,9 +444,13 @@ def main():
     }
     if args.stage_configs_path:
         omni_kwargs["stage_configs_path"] = args.stage_configs_path
-    if use_nextstep:
+    if args.model_class_name:
+        omni_kwargs["model_class_name"] = args.model_class_name
+    elif use_nextstep:
         # NextStep-1.1 requires explicit pipeline class
         omni_kwargs["model_class_name"] = "NextStep11Pipeline"
+    if args.diffusers_load_kwargs is not None:
+        omni_kwargs["diffusers_load_kwargs"] = args.diffusers_load_kwargs
     omni = Omni(**omni_kwargs)
     model_class_name = get_model_class_name(omni)
     declared_extra_body_params = get_extra_body_params(model_class_name)
@@ -455,6 +481,10 @@ def main():
         print(f"  LoRA: scale={args.lora_scale}")
     if args.stage_configs_path:
         print(f"  stage-configs-path: {args.stage_configs_path}")
+    if args.model_class_name:
+        print(f"  Model class name: {args.model_class_name}")
+    if args.diffusers_load_kwargs is not None:
+        print(f"  Diffusers load kwargs: {args.diffusers_load_kwargs}")
     print(f"{'=' * 60}\n")
 
     # Build LoRA request when --lora-path is set

@@ -38,7 +38,31 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
 
 Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
 There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
-which are only valid together with `--diffusion-load-format diffusers`.
+which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`.
+Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`,
+but does not delegate denoising to Diffusers.
+
+### Native Anima Single-File Checkpoints
+
+Anima single-file checkpoints are served through the native `AnimaPipeline`, not through
+`AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline`
+is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.
+
+```bash
+vllm serve "/path/to/anima-base-v1.0.safetensors" \
+    --omni \
+    --model-class-name AnimaPipeline \
+    --diffusers-load-kwargs '{"components_path": "/path/to/anima-components"}'
+```
+
+No deploy config is required for local Anima single-file checkpoint discovery
+when `--model-class-name AnimaPipeline` is provided.
+
+The native path needs the non-denoiser components (`text_encoder`, `tokenizer`,
+`t5_tokenizer`, `vae`, and optionally `scheduler`) in Diffusers `from_pretrained()`
+layout. Native Anima currently supports baseline single-GPU execution.
+Cache-DiT, TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG
+parallel, HSDP, and step execution are not supported by `AnimaPipeline` yet.
 
 After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.
 

@@ -39,6 +39,7 @@ recipes/
 | [`Bagel/BAGEL-7B-MoT.md`](./Bagel/BAGEL-7B-MoT.md) | Text-to-image with shared online/offline examples | 1x A100 80GB / 2x CUDA GPUs |
 | [`BosonAI/Higgs-Audio-V3-TTS.md`](./BosonAI/Higgs-Audio-V3-TTS.md) | Online + offline multilingual TTS with voice cloning | 1x H100 80GB |
 | [`ByteDance/Lance.md`](./ByteDance/Lance.md) | Unified AR+diffusion: text/img/video gen + understanding (Lance 3B) | 1x B300 / A100 80GB |
+| [`circlestone-labs/Anima.md`](./circlestone-labs/Anima.md) | Native single-file text-to-image serving | 1x AMD MI300X |
 | [`fishaudio/Fish-Speech-S2-Pro.md`](./fishaudio/Fish-Speech-S2-Pro.md) | Online serving for TTS | 1x A800 80GB |
 | [`Helios/Helios.md`](./Helios/Helios.md) | Text-to-video, image-to-video, and video-to-video generation | 1x NVIDIA H20 |
 | [`inclusionAI/Ming-flash-omni-2.0.md`](./inclusionAI/Ming-flash-omni-2.0.md) | Online serving for multimodal chat + standalone TTS | 4x H100 / 1x H100 80GB |