Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,5 +90,6 @@ th {
| `MiniCPMO45OmniForConditionalGeneration` | MiniCPM-o 4.5 | `openbmb/MiniCPM-o-4_5` | ✅︎ | | ✅︎ | |
| `ErnieImagePipeline` | ERNIE-Image | `baidu/ERNIE-Image`, `baidu/ERNIE-Image-Turbo` | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
|`HiDreamImagePipeline` | HiDream-I1-Full | `HiDream-ai/HiDream-I1-Full` | ✅︎ | ✅︎ | | |
| `AnimaPipeline` | Anima | `circlestone-labs/Anima` | ✅︎ | ✅︎ | | |

✅︎ indicates the model is supported on that backend. Empty cells mean not listed as supported on that backend.
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,63 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
--diffusion-load-format diffusers
```

Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
which are only valid together with `--diffusion-load-format diffusers`.
Users turn on the diffusers backend primarily through the `--diffusion-load-format diffusers` argument.

### Single-File Checkpoints

For single-file checkpoints (such as `.safetensors` or `.ckpt`), users can load them via the `--diffusion-load-format diffusers_single_file` argument (or simply point `--model` to a local single checkpoint file).

If a Diffusers pipeline class is needed, specify it using `--model-class-name`:

```bash
vllm serve "/path/to/model.safetensors" \
--omni \
--diffusion-load-format diffusers_single_file \
--model-class-name SomeDiffusersPipeline
```

Using `--diffusion-load-format diffusers_single_file` explicitly bypasses standard directory-based config loading. This allows you to pass a Hugging Face Hub ID (e.g. `repo/model`) or URL as the `--model` argument to fetch single files remotely, provided the specified Diffusers pipeline supports remote loading.

### Native Anima Single-File Checkpoints

Anima single-file checkpoints are served through the native `AnimaPipeline`, not through `AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline` is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.

Use `--model-class-name AnimaPipeline`. The native path reads the Anima transformer single-file checkpoint directly, converts original Cosmos transformer keys when needed, and loads the Cosmos transformer and text conditioner into vLLM-Omni native modules.

The native path also needs the non-denoiser components (`text_encoder`, `tokenizer`, `t5_tokenizer`, `vae`, and optionally `scheduler`). These must be in Diffusers `from_pretrained()` layout. Raw Anima auxiliary files such as `qwen_3_06b_base.safetensors` and `qwen_image_vae.safetensors` are converter inputs; they are not accepted directly as `components_path`.

Use the Anima converter from the Diffusers reference implementation to prepare the component directory:

```bash
python /path/to/convert_anima_to_diffusers.py \
--transformer_ckpt_path /path/to/anima-base-v1.0.safetensors \
--text_encoder_ckpt_path /path/to/qwen_3_06b_base.safetensors \
--vae_ckpt_path /path/to/qwen_image_vae.safetensors \
--qwen_tokenizer_path /path/to/qwen-tokenizer \
--t5_tokenizer_path /path/to/t5-tokenizer \
--output_path /path/to/anima-components \
--save_pipeline
```

Then point `--model` at the raw Anima transformer checkpoint and `components_path` at the converted directory:

```bash
vllm serve "/path/to/anima.safetensors" \
--omni \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs '{
"components_path": "/path/to/anima-components"
}'
```

No deploy config is required for local Anima single-file checkpoint discovery
when `--model-class-name AnimaPipeline` is provided.

Native Anima currently supports baseline single-GPU execution. Cache-DiT,
TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
HSDP, and step execution are not supported by `AnimaPipeline` yet.

There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`, which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`. Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`, but does not delegate denoising to Diffusers.

After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.

Expand Down
17 changes: 17 additions & 0 deletions examples/offline_inference/text_to_image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,23 @@ python examples/offline_inference/text_to_image/text_to_image.py \
--auxiliary-text-encoder meta-llama/Meta-Llama-3.1-8B-Instruct \
--output /output.png
```
### Anima Single-File Checkpoints

To load Anima, point `--model` to the single-file checkpoint path, pass the native pipeline class name using `--model-class-name`, and supply the converted components directory using `--diffusers-load-kwargs`:

```bash
python examples/offline_inference/text_to_image/text_to_image.py \
--model /path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs '{"components_path": "/path/to/models/anima-components"}' \
--prompt "A cinematic close-up of a glass teapot on a wooden table." \
--seed 42 \
--guidance-scale 4.0 \
--num-inference-steps 50 \
--height 1024 \
--width 1024 \
--output anima_output.png
```

### Batch Requests (Multiple Prompts)

Expand Down
32 changes: 31 additions & 1 deletion examples/offline_inference/text_to_image/text_to_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,16 @@ def parse_profiler_config(value: str) -> dict[str, Any]:
return config


def parse_json_dict(value: str) -> dict[str, Any]:
try:
config = json.loads(value)
except json.JSONDecodeError as e:
raise argparse.ArgumentTypeError(f"Must be a valid JSON object: {e}") from e
if not isinstance(config, dict):
raise argparse.ArgumentTypeError("Must be a JSON object (dict)")
return config


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Generate an image with supported diffusion models.")
parser.add_argument(
Expand Down Expand Up @@ -327,6 +337,18 @@ def parse_args() -> argparse.Namespace:
default=None,
help="Supplementary auxiliary text encoder parameters model name or path (especially for Hidream-l1-full).",
)
parser.add_argument(
"--model-class-name",
type=str,
default=None,
help="Override the diffusion pipeline class name (e.g. AnimaPipeline).",
)
parser.add_argument(
"--diffusers-load-kwargs",
type=parse_json_dict,
default=None,
help='JSON object passed to model loader (e.g. \'{"components_path": "/path"}\').',
)
current_omni_platform.pre_register_and_update(parser)
return parser.parse_args()

Expand Down Expand Up @@ -422,9 +444,13 @@ def main():
}
if args.stage_configs_path:
omni_kwargs["stage_configs_path"] = args.stage_configs_path
if use_nextstep:
if args.model_class_name:
omni_kwargs["model_class_name"] = args.model_class_name
elif use_nextstep:
# NextStep-1.1 requires explicit pipeline class
omni_kwargs["model_class_name"] = "NextStep11Pipeline"
if args.diffusers_load_kwargs is not None:
omni_kwargs["diffusers_load_kwargs"] = args.diffusers_load_kwargs
omni = Omni(**omni_kwargs)
model_class_name = get_model_class_name(omni)
declared_extra_body_params = get_extra_body_params(model_class_name)
Expand Down Expand Up @@ -455,6 +481,10 @@ def main():
print(f" LoRA: scale={args.lora_scale}")
if args.stage_configs_path:
print(f" stage-configs-path: {args.stage_configs_path}")
if args.model_class_name:
print(f" Model class name: {args.model_class_name}")
if args.diffusers_load_kwargs is not None:
print(f" Diffusers load kwargs: {args.diffusers_load_kwargs}")
print(f"{'=' * 60}\n")

# Build LoRA request when --lora-path is set
Expand Down
26 changes: 25 additions & 1 deletion examples/online_serving/diffusers_pipeline_adapter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,31 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \

Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
which are only valid together with `--diffusion-load-format diffusers`.
which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`.
Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`,
but does not delegate denoising to Diffusers.

### Native Anima Single-File Checkpoints

Anima single-file checkpoints are served through the native `AnimaPipeline`, not through
`AnimaModularPipeline.from_single_file()`. If `--model-class-name AnimaModularPipeline`
is passed for a local single-file checkpoint, vLLM-Omni maps it to `AnimaPipeline`.

```bash
vllm serve "/path/to/anima-base-v1.0.safetensors" \
--omni \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs '{"components_path": "/path/to/anima-components"}'
```

No deploy config is required for local Anima single-file checkpoint discovery
when `--model-class-name AnimaPipeline` is provided.

The native path needs the non-denoiser components (`text_encoder`, `tokenizer`,
`t5_tokenizer`, `vae`, and optionally `scheduler`) in Diffusers `from_pretrained()`
layout. Native Anima currently supports baseline single-GPU execution.
Cache-DiT, TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG
parallel, HSDP, and step execution are not supported by `AnimaPipeline` yet.

After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.

Expand Down
1 change: 1 addition & 0 deletions recipes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ recipes/
| [`Bagel/BAGEL-7B-MoT.md`](./Bagel/BAGEL-7B-MoT.md) | Text-to-image with shared online/offline examples | 1x A100 80GB / 2x CUDA GPUs |
| [`BosonAI/Higgs-Audio-V3-TTS.md`](./BosonAI/Higgs-Audio-V3-TTS.md) | Online + offline multilingual TTS with voice cloning | 1x H100 80GB |
| [`ByteDance/Lance.md`](./ByteDance/Lance.md) | Unified AR+diffusion: text/img/video gen + understanding (Lance 3B) | 1x B300 / A100 80GB |
| [`circlestone-labs/Anima.md`](./circlestone-labs/Anima.md) | Native single-file text-to-image serving | 1x AMD MI300X |
| [`fishaudio/Fish-Speech-S2-Pro.md`](./fishaudio/Fish-Speech-S2-Pro.md) | Online serving for TTS | 1x A800 80GB |
| [`Helios/Helios.md`](./Helios/Helios.md) | Text-to-video, image-to-video, and video-to-video generation | 1x NVIDIA H20 |
| [`inclusionAI/Ming-flash-omni-2.0.md`](./inclusionAI/Ming-flash-omni-2.0.md) | Online serving for multimodal chat + standalone TTS | 4x H100 / 1x H100 80GB |
Expand Down
160 changes: 160 additions & 0 deletions recipes/circlestone-labs/Anima.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Anima

> Native single-file diffusion text-to-image

## Summary

- Vendor: circlestone-labs
- Model: [`circlestone-labs/Anima`](https://huggingface.co/circlestone-labs/Anima)
- Task: text2img
- Mode: Offline inference, Online serving (OpenAI-compatible API)
- Maintainer: Community

## When to use this recipe

Use this recipe to run Anima via vLLM-Omni's native `AnimaPipeline`. Anima is a
Cosmos-style diffusion transformer text-to-image model distributed as a
single-file transformer checkpoint, not as a normal Hugging Face model
directory.

The native path reads the Anima transformer checkpoint directly, converts
original Cosmos transformer keys when needed, and loads the Cosmos transformer
and text conditioner into native vLLM-Omni modules. Non-denoiser components
such as `text_encoder`, `tokenizer`, `t5_tokenizer`, `vae`, and optionally
`scheduler` must be supplied through a Diffusers-layout components directory.

Native Anima currently supports baseline single-GPU execution. Cache-DiT,
TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
HSDP, and step execution are not supported by `AnimaPipeline` yet.

## References

- Offline example:
[`examples/offline_inference/text_to_image/README.md`](../../examples/offline_inference/text_to_image/README.md)
- Online serving notes:
[Diffusers pipeline adapter docs](../../docs/user_guide/examples/online_serving/diffusers_pipeline_adapter.md)
- Supported model entry:
[`docs/models/supported_models.md`](../../docs/models/supported_models.md)
- HuggingFace model page:
[circlestone-labs/Anima](https://huggingface.co/circlestone-labs/Anima)
- Diffusers-layout components:
[circlestone-labs/Anima-Base-v1.0-Diffusers](https://huggingface.co/circlestone-labs/Anima-Base-v1.0-Diffusers)

## Hardware Support

## ROCm

### 1x AMD MI300X

#### Environment

- OS: Ubuntu 22.04.5 LTS (x86_64)
- Python: 3.12.13
- Driver / runtime: ROCm 7.2.53211, HIP runtime 7.2.53211, MIOpen 3.5.1
- PyTorch: 2.10.0+git8514f05 built with ROCm 7.2.53211
- GPU: 1x AMD MI300X
- vLLM version: 0.23.0
- vLLM-Omni version or commit: 0.1.dev2002+g704724675.rocm

Recommended environment variables:

```bash
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_ROCM_USE_AITER=0
export PYTORCH_ROCM_ARCH=gfx942
export PYTORCH_NVML_BASED_CUDA_CHECK=1
```

#### Command - prepare assets

Download the official transformer checkpoint and the Diffusers-layout component
directory:

```bash
mkdir -p /path/to/models/anima-official
mkdir -p /path/to/models/anima-components

hf download circlestone-labs/Anima \
split_files/diffusion_models/anima-base-v1.0.safetensors \
--local-dir /path/to/models/anima-official

hf download circlestone-labs/Anima-Base-v1.0-Diffusers \
--local-dir /path/to/models/anima-components

export ANIMA_CHECKPOINT=/path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors
export ANIMA_COMPONENTS=/path/to/models/anima-components
```

The `ANIMA_COMPONENTS` directory must be in Diffusers `from_pretrained()`
layout. Raw auxiliary files such as `qwen_3_06b_base.safetensors` and
`qwen_image_vae.safetensors` are converter inputs; they are not accepted
directly as `components_path`.

#### Command - text-to-image

```bash
python examples/offline_inference/text_to_image/text_to_image.py \
--model "$ANIMA_CHECKPOINT" \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs "{\"components_path\":\"$ANIMA_COMPONENTS\"}" \
--prompt "A cinematic close-up of a glass teapot on a wooden table." \
--seed 42 \
--guidance-scale 4.0 \
--num-inference-steps 50 \
--height 1024 \
--width 1024 \
--output /tmp/anima_output.png
```

Use 1024x1024, 50 denoising steps, `max_sequence_length=512`, one image per
prompt, empty negative prompt, and CFG scale 4.0 when matching Anima's default
Diffusers settings.

#### Verification

Check that `/tmp/anima_output.png` exists and contains a generated image.

#### Notes

- Key flags: `--model-class-name AnimaPipeline` selects the native Anima path;
`--diffusers-load-kwargs` supplies `components_path`.
- No deploy config is required for local single-file checkpoint discovery when
`--model-class-name AnimaPipeline` is provided.
- Start with `max-concurrency=1` for correctness and latency validation.
- Keep requests at the same resolution when comparing runs.
- Do not enable parallelism, cache acceleration, offload, or quantized
checkpoint flags for Anima until support is added to `AnimaPipeline`.

## Online Serving

Anima supports text-to-image generation through the OpenAI-compatible image
generation API.

### Launch

```bash
vllm serve "$ANIMA_CHECKPOINT" \
--omni \
--port 8099 \
--model-class-name AnimaPipeline \
--diffusers-load-kwargs "{\"components_path\":\"$ANIMA_COMPONENTS\"}"
```

### Send requests

```bash
curl http://localhost:8099/v1/images/generations \
-H "Content-Type: application/json" \
-d "{
\"model\": \"$ANIMA_CHECKPOINT\",
\"prompt\": \"A cinematic close-up of a glass teapot on a wooden table.\",
\"size\": \"1024x1024\",
\"num_inference_steps\": 50,
\"guidance_scale\": 4.0,
\"seed\": 42
}"
```

The same generation knobs used by other text-to-image recipes apply:
`num_inference_steps`, `seed`, `height` / `width` through `size`, and optional
negative prompting.
Loading
Loading