Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions benchmarks/diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,74 @@ batch may still pay compile or CUDA-graph capture cost.

For a Qwen-Image continuous-batching replay example, see
[`performance_dashboard/qwen_image_serving_performance.md`](./performance_dashboard/qwen_image_serving_performance.md).

## 4. Anima Native Single-File Benchmarking

Native Anima is benchmarked as a text-to-image model through the same serving
benchmark entrypoint. Unlike standard HuggingFace model IDs, Anima serves the
raw single-file transformer checkpoint and loads non-denoiser components from a
Diffusers-layout component directory.

Download the official Anima checkpoint and components first. The commands below
use `/path/to/models` as a placeholder; replace it with any local directory that
has enough space for the checkpoint and component files.

```bash
mkdir -p /path/to/models/anima-official
mkdir -p /path/to/models/anima-components

hf download circlestone-labs/Anima \
split_files/diffusion_models/anima-base-v1.0.safetensors \
--local-dir /path/to/models/anima-official

hf download circlestone-labs/Anima-Base-v1.0-Diffusers \
--local-dir /path/to/models/anima-components

CHECKPOINT=/path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors
COMPONENTS=/path/to/models/anima-components
```

Run these commands from the vLLM-Omni repository in the Python environment or
container where vLLM-Omni is installed.

Start the server with the checkpoint as `--model` and pass the component
directory through `--diffusers-load-kwargs`:

```bash
vllm serve "$CHECKPOINT" \
--omni \
--port 8099 \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs "{\"components_path\":\"$COMPONENTS\"}"
```

Then run the standard diffusion serving benchmark:

```bash
python3 benchmarks/diffusion/diffusion_benchmark_serving.py \
--base-url http://localhost:8099 \
--endpoint /v1/chat/completions \
--model "$CHECKPOINT" \
--task t2i \
--dataset random \
--num-prompts 10 \
--max-concurrency 1 \
--warmup-requests 1 \
--warmup-concurrency 1 \
--width 1024 \
--height 1024 \
--num-inference-steps 50
```

This matches the Diffusers baseline defaults for Anima: 1024x1024, 50 denoising
steps, `max_sequence_length=512`, one image per prompt, empty negative prompt,
and CFG scale 4.0 from the default guider. Do not pass `guidance_scale` through
the benchmark unless you are intentionally measuring a non-default CFG setting.

Native Anima currently supports baseline single-GPU execution. Cache-DiT,
TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
HSDP, and step execution are not supported by `AnimaPipeline` yet.

Anima uses the default single diffusion stage for local single-file checkpoint
discovery when `--model-class-name AnimaPipeline` is provided; no deploy config
is required.
1 change: 1 addition & 0 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,5 +89,6 @@ th {
| `MiniCPMO45OmniForConditionalGeneration` | MiniCPM-o 4.5 | `openbmb/MiniCPM-o-4_5` | ✅︎ | | ✅︎ | |
| `ErnieImagePipeline` | ERNIE-Image | `baidu/ERNIE-Image`, `baidu/ERNIE-Image-Turbo` | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
|`HiDreamImagePipeline` | HiDream-I1-Full | `HiDream-ai/HiDream-I1-Full` | ✅︎ | ✅︎ | | |
| `AnimaPipeline` | Anima | `circlestone-labs/Anima` | ✅︎ | ✅︎ | | |

✅︎ indicates the model is supported on that backend. Empty cells mean not listed as supported on that backend.
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,63 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
--diffusion-load-format diffusers
```

Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
which are only valid together with `--diffusion-load-format diffusers`.
Users turn on the diffusers backend primarily through the `--diffusion-load-format diffusers` argument.

### Single-File Checkpoints

For single-file checkpoints (such as `.safetensors` or `.ckpt`), users can load them via the `--diffusion-load-format diffusers_single_file` argument (or simply point `--model` to a local single checkpoint file).

If a Diffusers pipeline class is needed, specify it using `--model-class-name`:

```bash
vllm serve "/path/to/model.safetensors" \
--omni \
--diffusion-load-format diffusers_single_file \
--model-class-name SomeDiffusersPipeline
```

Using `--diffusion-load-format diffusers_single_file` explicitly bypasses standard directory-based config loading. This allows you to pass a Hugging Face Hub ID (e.g. `repo/model`) or URL as the `--model` argument to fetch single files remotely, provided the specified Diffusers pipeline supports remote loading.

### Native Anima Single-File Checkpoints

Anima single-file checkpoints are served through the native `AnimaPipeline`, not through `AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline` is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.

Use `--model-class-name AnimaPipeline`. The native path reads the Anima transformer single-file checkpoint directly, converts original Cosmos transformer keys when needed, and loads the Cosmos transformer and text conditioner into vLLM-Omni native modules.

The native path also needs the non-denoiser components (`text_encoder`, `tokenizer`, `t5_tokenizer`, `vae`, and optionally `scheduler`). These must be in Diffusers `from_pretrained()` layout. Raw Anima auxiliary files such as `qwen_3_06b_base.safetensors` and `qwen_image_vae.safetensors` are converter inputs; they are not accepted directly as `components_path`.

Use the Anima converter from the Diffusers reference implementation to prepare the component directory:

```bash
python /path/to/convert_anima_to_diffusers.py \
--transformer_ckpt_path /path/to/anima-base-v1.0.safetensors \
--text_encoder_ckpt_path /path/to/qwen_3_06b_base.safetensors \
--vae_ckpt_path /path/to/qwen_image_vae.safetensors \
--qwen_tokenizer_path /path/to/qwen-tokenizer \
--t5_tokenizer_path /path/to/t5-tokenizer \
--output_path /path/to/anima-components \
--save_pipeline
```

Then point `--model` at the raw Anima transformer checkpoint and `components_path` at the converted directory:

```bash
vllm serve "/path/to/anima.safetensors" \
--omni \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs '{
"components_path": "/path/to/anima-components"
}'
```

No deploy config is required for local Anima single-file checkpoint discovery
when `--model-class-name AnimaPipeline` is provided.

Native Anima currently supports baseline single-GPU execution. Cache-DiT,
TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
HSDP, and step execution are not supported by `AnimaPipeline` yet.

There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`, which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`. Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`, but does not delegate denoising to Diffusers.

After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.

Expand Down
17 changes: 17 additions & 0 deletions examples/offline_inference/text_to_image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,23 @@ python examples/offline_inference/text_to_image/text_to_image.py \
--auxiliary-text-encoder meta-llama/Meta-Llama-3.1-8B-Instruct \
--output /output.png
```
### Anima Single-File Checkpoints

To load Anima, point `--model` to the single-file checkpoint path, pass the native pipeline class name using `--model-class-name`, and supply the converted components directory using `--diffusers-load-kwargs`:

```bash
python examples/offline_inference/text_to_image/text_to_image.py \
--model /path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs '{"components_path": "/path/to/models/anima-components"}' \
--prompt "A cinematic close-up of a glass teapot on a wooden table." \
--seed 42 \
--guidance-scale 4.0 \
--num-inference-steps 50 \
--height 1024 \
--width 1024 \
--output anima_output.png
```

### Batch Requests (Multiple Prompts)

Expand Down
32 changes: 31 additions & 1 deletion examples/offline_inference/text_to_image/text_to_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,16 @@ def parse_profiler_config(value: str) -> dict[str, Any]:
return config


def parse_json_dict(value: str) -> dict[str, Any]:
try:
config = json.loads(value)
except json.JSONDecodeError as e:
raise argparse.ArgumentTypeError(f"Must be a valid JSON object: {e}") from e
if not isinstance(config, dict):
raise argparse.ArgumentTypeError("Must be a JSON object (dict)")
return config


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Generate an image with supported diffusion models.")
parser.add_argument(
Expand Down Expand Up @@ -327,6 +337,18 @@ def parse_args() -> argparse.Namespace:
default=None,
help="Supplementary auxiliary text encoder parameters model name or path (especially for Hidream-l1-full).",
)
parser.add_argument(
"--model-class-name",
type=str,
default=None,
help="Override the diffusion pipeline class name (e.g. AnimaPipeline).",
)
parser.add_argument(
"--diffusers-load-kwargs",
type=parse_json_dict,
default=None,
help='JSON object passed to model loader (e.g. \'{"components_path": "/path"}\').',
)
current_omni_platform.pre_register_and_update(parser)
return parser.parse_args()

Expand Down Expand Up @@ -422,9 +444,13 @@ def main():
}
if args.stage_configs_path:
omni_kwargs["stage_configs_path"] = args.stage_configs_path
if use_nextstep:
if args.model_class_name:
omni_kwargs["model_class_name"] = args.model_class_name
elif use_nextstep:
# NextStep-1.1 requires explicit pipeline class
omni_kwargs["model_class_name"] = "NextStep11Pipeline"
if args.diffusers_load_kwargs is not None:
omni_kwargs["diffusers_load_kwargs"] = args.diffusers_load_kwargs
omni = Omni(**omni_kwargs)
model_class_name = get_model_class_name(omni)
declared_extra_body_params = get_extra_body_params(model_class_name)
Expand Down Expand Up @@ -455,6 +481,10 @@ def main():
print(f" LoRA: scale={args.lora_scale}")
if args.stage_configs_path:
print(f" stage-configs-path: {args.stage_configs_path}")
if args.model_class_name:
print(f" Model class name: {args.model_class_name}")
if args.diffusers_load_kwargs is not None:
print(f" Diffusers load kwargs: {args.diffusers_load_kwargs}")
print(f"{'=' * 60}\n")

# Build LoRA request when --lora-path is set
Expand Down
26 changes: 25 additions & 1 deletion examples/online_serving/diffusers_pipeline_adapter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,31 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \

Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
which are only valid together with `--diffusion-load-format diffusers`.
which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`.
Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`,
but does not delegate denoising to Diffusers.

### Native Anima Single-File Checkpoints

Anima single-file checkpoints are served through the native `AnimaPipeline`, not through
`AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline`
is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.

```bash
vllm serve "/path/to/anima-base-v1.0.safetensors" \
--omni \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs '{"components_path": "/path/to/anima-components"}'
```

No deploy config is required for local Anima single-file checkpoint discovery
when `--model-class-name AnimaPipeline` is provided.

The native path needs the non-denoiser components (`text_encoder`, `tokenizer`,
`t5_tokenizer`, `vae`, and optionally `scheduler`) in Diffusers `from_pretrained()`
layout. Native Anima currently supports baseline single-GPU execution.
Cache-DiT, TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG
parallel, HSDP, and step execution are not supported by `AnimaPipeline` yet.

After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.

Expand Down
1 change: 1 addition & 0 deletions recipes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ recipes/
| [`Bagel/BAGEL-7B-MoT.md`](./Bagel/BAGEL-7B-MoT.md) | Text-to-image with shared online/offline examples | 1x A100 80GB / 2x CUDA GPUs |
| [`BosonAI/Higgs-Audio-V3-TTS.md`](./BosonAI/Higgs-Audio-V3-TTS.md) | Online + offline multilingual TTS with voice cloning | 1x H100 80GB |
| [`ByteDance/Lance.md`](./ByteDance/Lance.md) | Unified AR+diffusion: text/img/video gen + understanding (Lance 3B) | 1x B300 / A100 80GB |
| [`circlestone-labs/Anima.md`](./circlestone-labs/Anima.md) | Native single-file text-to-image serving | 1x AMD MI300X |
| [`fishaudio/Fish-Speech-S2-Pro.md`](./fishaudio/Fish-Speech-S2-Pro.md) | Online serving for TTS | 1x A800 80GB |
| [`Helios/Helios.md`](./Helios/Helios.md) | Text-to-video, image-to-video, and video-to-video generation | 1x NVIDIA H20 |
| [`inclusionAI/Ming-flash-omni-2.0.md`](./inclusionAI/Ming-flash-omni-2.0.md) | Online serving for multimodal chat + standalone TTS | 4x H100 / 1x H100 80GB |
Expand Down
Loading
Loading