Skip to content

Commit 77d77f4

Browse files
committed
add Anima pipeline
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
1 parent bdfc771 commit 77d77f4

20 files changed

Lines changed: 3114 additions & 14 deletions

docs/user_guide/examples/online_serving/diffusers_pipeline_adapter.md

Lines changed: 73 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,79 @@ vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
3939
--diffusion-load-format diffusers
4040
```
4141

42-
Users turn on the diffusers backend primarily through `--diffusion-load-format diffusers` argument.
43-
There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`,
44-
which are only valid together with `--diffusion-load-format diffusers`.
42+
Users turn on the diffusers backend primarily through the `--diffusion-load-format diffusers` argument.
43+
44+
### Single-File Checkpoints (e.g. Anima)
45+
46+
For single-file checkpoints (such as `.safetensors` or `.ckpt`), users can load them via the `--diffusion-load-format diffusers_single_file` argument (or simply point `--model` to a single checkpoint file).
47+
48+
If a custom pipeline class (such as `AnimaModularPipeline`) is needed, specify it using `--model-class-name`:
49+
50+
```bash
51+
vllm serve "/path/to/anima.safetensors" \
52+
--omni \
53+
--diffusion-load-format diffusers_single_file \
54+
--model-class-name AnimaModularPipeline
55+
```
56+
57+
For the native Anima path, use `--model-class-name AnimaPipeline` without `--diffusion-load-format diffusers_single_file`. The native path reads the Anima transformer single-file checkpoint directly, converts original Cosmos transformer keys when needed, and loads the Cosmos transformer and text conditioner into vLLM-Omni native modules.
58+
59+
The native path also needs the non-denoiser components (`text_encoder`, `tokenizer`, `t5_tokenizer`, `vae`, and optionally `scheduler`). These must be in Diffusers `from_pretrained()` layout. Raw Anima auxiliary files such as `qwen_3_06b_base.safetensors` and `qwen_image_vae.safetensors` are converter inputs; they are not accepted directly as `components_path`.
60+
61+
Use the Anima converter from the Diffusers reference implementation to prepare the component directory:
62+
63+
```bash
64+
python /path/to/convert_anima_to_diffusers.py \
65+
--transformer_ckpt_path /path/to/anima-preview3-base.safetensors \
66+
--text_encoder_ckpt_path /path/to/qwen_3_06b_base.safetensors \
67+
--vae_ckpt_path /path/to/qwen_image_vae.safetensors \
68+
--qwen_tokenizer_path /path/to/qwen-tokenizer \
69+
--t5_tokenizer_path /path/to/t5-tokenizer \
70+
--output_path /path/to/anima-components \
71+
--save_pipeline
72+
```
73+
74+
Then point `--model` at the raw Anima transformer checkpoint and `components_path` at the converted directory:
75+
76+
```bash
77+
vllm serve "/path/to/anima.safetensors" \
78+
--omni \
79+
--model-class-name AnimaPipeline \
80+
--diffusers-load-kwargs '{
81+
"components_path": "/path/to/anima-components"
82+
}'
83+
```
84+
85+
Advanced execution features such as step execution, TP/SP, CFG parallel, HSDP, and Cache-DiT must stay disabled until CUDA parity is validated.
86+
87+
Before enabling the native Anima path in a PR, record CUDA validation evidence on the target GPU:
88+
89+
```bash
90+
# Native vLLM-Omni path.
91+
vllm serve "/path/to/anima-preview3-base.safetensors" \
92+
--omni \
93+
--model-class-name AnimaPipeline \
94+
--enable-diffusion-pipeline-profiler \
95+
--diffusers-load-kwargs '{"components_path": "/path/to/anima-components"}'
96+
97+
# Diffusers adapter baseline using the same checkpoint, prompt, size, seed, and step count.
98+
vllm serve "/path/to/anima-preview3-base.safetensors" \
99+
--omni \
100+
--diffusion-load-format diffusers_single_file \
101+
--model-class-name AnimaModularPipeline \
102+
--enable-diffusion-pipeline-profiler
103+
```
104+
105+
The validation report should include:
106+
107+
| Item | Required evidence |
108+
|------|-------------------|
109+
| Output parity | Native and Diffusers-adapter samples from the same prompt, resolution, step count, and seed |
110+
| Latency | End-to-end time for native and Diffusers-adapter runs on the same GPU |
111+
| Memory | Peak VRAM for native and Diffusers-adapter runs |
112+
| Profiler | `torch.profiler` or `nsys` summary showing no unexpected CPU fallback, host/device copy storm, or synchronization hotspot in the native transformer/text-conditioner path |
113+
114+
There are two more optional arguments, `--diffusers-load-kwargs` and `--diffusers-call-kwargs`, which are valid together with `--diffusion-load-format diffusers` or `diffusers_single_file`. Native Anima also accepts `--diffusers-load-kwargs` for component paths such as `components_path`, but does not delegate denoising to Diffusers.
45115

46116
After launching the model, users send a request as usual. Refer to other documentation pages on how to request a particular input/output modality, such as `examples/online_serving/text_to_image/openai_chat_client.py`.
47117

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
3+
4+
from types import SimpleNamespace
5+
6+
import diffusers
7+
import torch
8+
from safetensors.torch import save_file
9+
10+
from vllm_omni.diffusion.data import OmniDiffusionConfig
11+
12+
13+
def test_anima_registration():
14+
# Verify that Anima modules were dynamically injected into the diffusers package
15+
import vllm_omni.diffusion.models.anima # noqa: F401
16+
from vllm_omni.diffusion.registry import DiffusionModelRegistry
17+
18+
assert hasattr(diffusers, "AnimaModularPipeline")
19+
assert hasattr(diffusers.modular_pipelines, "AnimaModularPipeline")
20+
assert hasattr(diffusers.models, "AnimaTextConditioner")
21+
assert hasattr(diffusers.models.condition_embedders, "AnimaTextConditioner")
22+
assert DiffusionModelRegistry._try_load_model_cls("AnimaPipeline") is not None
23+
24+
25+
def test_enrich_config_single_file(tmp_path):
26+
# Verify single-file config enrichment path
27+
dummy_checkpoint = tmp_path / "model.safetensors"
28+
dummy_checkpoint.write_text("dummy")
29+
30+
config = OmniDiffusionConfig(
31+
model=str(dummy_checkpoint),
32+
diffusion_load_format="diffusers_single_file",
33+
model_class_name="AnimaModularPipeline",
34+
)
35+
config.enrich_config()
36+
37+
assert config.model_class_name == "DiffusersAdapterPipeline"
38+
assert config.diffusers_pipeline_cls is diffusers.AnimaModularPipeline
39+
40+
41+
def test_enrich_config_single_file_autodetects_local_file(tmp_path):
42+
dummy_checkpoint = tmp_path / "model.safetensors"
43+
dummy_checkpoint.write_text("dummy")
44+
45+
config = OmniDiffusionConfig(
46+
model=str(dummy_checkpoint),
47+
model_class_name="AnimaModularPipeline",
48+
)
49+
config.enrich_config()
50+
51+
assert config.diffusion_load_format == "diffusers_single_file"
52+
assert config.model_class_name == "DiffusersAdapterPipeline"
53+
assert config.diffusers_pipeline_cls is diffusers.AnimaModularPipeline
54+
55+
56+
def test_enrich_config_native_anima_single_file_stays_native(tmp_path):
57+
dummy_checkpoint = tmp_path / "model.safetensors"
58+
dummy_checkpoint.write_text("dummy")
59+
60+
config = OmniDiffusionConfig(
61+
model=str(dummy_checkpoint),
62+
model_class_name="AnimaPipeline",
63+
)
64+
config.enrich_config()
65+
66+
assert config.diffusion_load_format == "default"
67+
assert config.model_class_name == "AnimaPipeline"
68+
assert config.diffusers_pipeline_cls is None
69+
70+
71+
def test_native_anima_single_file_allows_load_kwargs(tmp_path):
72+
dummy_checkpoint = tmp_path / "model.safetensors"
73+
dummy_checkpoint.write_text("dummy")
74+
75+
config = OmniDiffusionConfig(
76+
model=str(dummy_checkpoint),
77+
model_class_name="AnimaPipeline",
78+
diffusers_load_kwargs={"local_files_only": True},
79+
)
80+
config.enrich_config()
81+
82+
assert config.diffusion_load_format == "default"
83+
assert config.diffusers_load_kwargs == {"local_files_only": True}
84+
85+
86+
def test_native_anima_converts_original_cosmos_transformer_keys():
87+
from vllm_omni.diffusion.models.anima.pipeline_anima import AnimaPipeline
88+
89+
converted = AnimaPipeline._convert_cosmos_2_transformer_state_dict(
90+
{
91+
"net.x_embedder.proj.1.weight": "patch",
92+
"net.blocks.0.self_attn.q_proj.weight": "q",
93+
"net.blocks.0.self_attn.q_norm.weight": "q_norm",
94+
"net.blocks.0.mlp.layer1.weight": "mlp",
95+
"net.final_layer.linear.weight": "out",
96+
"net.accum_iteration": "drop",
97+
}
98+
)
99+
100+
assert converted == {
101+
"patch_embed.proj.weight": "patch",
102+
"transformer_blocks.0.attn1.to_q.weight": "q",
103+
"transformer_blocks.0.attn1.norm_q.weight": "q_norm",
104+
"transformer_blocks.0.ff.net.0.proj.weight": "mlp",
105+
"proj_out.weight": "out",
106+
}
107+
108+
109+
def test_native_anima_loads_synthetic_single_file(tmp_path, monkeypatch):
110+
import vllm_omni.diffusion.models.anima.pipeline_anima as pipeline_anima
111+
from vllm_omni.diffusion.models.anima.native_cosmos_transformer import NativeCosmosTransformer3DModel
112+
from vllm_omni.diffusion.models.anima.native_text_conditioner import NativeAnimaTextConditioner
113+
114+
tiny_transformer_config = {
115+
"in_channels": 1,
116+
"out_channels": 1,
117+
"num_attention_heads": 1,
118+
"attention_head_dim": 12,
119+
"num_layers": 1,
120+
"mlp_ratio": 1.0,
121+
"text_embed_dim": 4,
122+
"adaln_lora_dim": 3,
123+
"max_size": (1, 2, 2),
124+
"patch_size": (1, 1, 1),
125+
"rope_scale": (1.0, 1.0, 1.0),
126+
"concat_padding_mask": True,
127+
"extra_pos_embed_type": None,
128+
}
129+
monkeypatch.setattr(pipeline_anima, "_ANIMA_TRANSFORMER_CONFIG", tiny_transformer_config)
130+
131+
transformer = NativeCosmosTransformer3DModel(**tiny_transformer_config)
132+
text_conditioner = NativeAnimaTextConditioner(
133+
source_dim=4,
134+
target_dim=4,
135+
model_dim=4,
136+
num_layers=1,
137+
num_attention_heads=1,
138+
target_vocab_size=8,
139+
min_sequence_length=4,
140+
)
141+
transformer_state = {name: tensor.detach().clone() for name, tensor in transformer.state_dict().items()}
142+
text_conditioner_state = {name: tensor.detach().clone() for name, tensor in text_conditioner.state_dict().items()}
143+
checkpoint_state = {
144+
**{f"transformer.{name}": tensor for name, tensor in transformer_state.items()},
145+
**{f"text_conditioner.{name}": tensor for name, tensor in text_conditioner_state.items()},
146+
}
147+
148+
checkpoint_path = tmp_path / "anima.safetensors"
149+
save_file(checkpoint_state, str(checkpoint_path))
150+
151+
pipeline = pipeline_anima.AnimaPipeline.__new__(pipeline_anima.AnimaPipeline)
152+
pipeline.od_config = SimpleNamespace(model=str(checkpoint_path), dtype=torch.float32)
153+
pipeline.device = torch.device("cpu")
154+
155+
def assert_loaded(loaded_transformer, loaded_text_conditioner):
156+
for name, tensor in transformer_state.items():
157+
assert torch.equal(loaded_transformer.state_dict()[name], tensor)
158+
for name, tensor in text_conditioner_state.items():
159+
assert torch.equal(loaded_text_conditioner.state_dict()[name], tensor)
160+
161+
loaded_transformer, loaded_text_conditioner = pipeline._load_native_denoiser_components(dict(checkpoint_state))
162+
assert_loaded(loaded_transformer, loaded_text_conditioner)
163+
164+
loaded_transformer, loaded_text_conditioner = pipeline._load_native_denoiser_components()
165+
assert_loaded(loaded_transformer, loaded_text_conditioner)
166+
167+
168+
def test_enrich_config_single_file_rejects_unknown_pipeline(tmp_path):
169+
dummy_checkpoint = tmp_path / "model.safetensors"
170+
dummy_checkpoint.write_text("dummy")
171+
172+
config = OmniDiffusionConfig(
173+
model=str(dummy_checkpoint),
174+
diffusion_load_format="diffusers_single_file",
175+
model_class_name="MissingPipeline",
176+
)
177+
try:
178+
config.enrich_config()
179+
except ValueError as exc:
180+
assert "Could not find diffusers pipeline class MissingPipeline" in str(exc)
181+
else:
182+
raise AssertionError("Expected unknown single-file pipeline class to fail.")

vllm_omni/diffusion/data.py

Lines changed: 79 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
# SPDX-License-Identifier: Apache-2.0
33
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
44
import copy
5+
import importlib
56
import os
67
import random
78
from collections.abc import Callable, Mapping
@@ -31,6 +32,43 @@
3132

3233
logger = init_logger(__name__)
3334

35+
_NATIVE_SINGLE_FILE_DIFFUSION_MODELS = {"AnimaPipeline"}
36+
37+
38+
def _diffusers_pipeline_module_name(model_class_name):
39+
base_name = model_class_name
40+
for suffix in ("ModularPipeline", "Pipeline"):
41+
if base_name.endswith(suffix):
42+
base_name = base_name[: -len(suffix)]
43+
break
44+
if not base_name:
45+
return None
46+
47+
chars = []
48+
for index, char in enumerate(base_name):
49+
if char.isupper() and index > 0:
50+
chars.append("_")
51+
chars.append(char.lower())
52+
return "vllm_omni.diffusion.models." + "".join(chars)
53+
54+
55+
def _resolve_diffusers_pipeline_cls(model_class_name):
56+
if hasattr(diffusers, model_class_name):
57+
return getattr(diffusers, model_class_name)
58+
59+
module_name = _diffusers_pipeline_module_name(model_class_name)
60+
if module_name is not None:
61+
try:
62+
importlib.import_module(module_name)
63+
except ModuleNotFoundError as exc:
64+
if exc.name != module_name:
65+
raise
66+
else:
67+
if hasattr(diffusers, model_class_name):
68+
return getattr(diffusers, model_class_name)
69+
70+
raise ValueError(f"Could not find diffusers pipeline class {model_class_name} in diffusers namespace.")
71+
3472

3573
def parse_kv_cache_skip_selector(
3674
selector: str | list[int] | tuple[int, ...] | set[int] | None,
@@ -525,7 +563,8 @@ class OmniDiffusionConfig:
525563
custom_pipeline_args: dict[str, Any] | None = None
526564

527565
# Diffusion model loading format
528-
# "default", "custom_pipeline", "dummy", "diffusers" (HF diffusers adapter)
566+
# "default", "custom_pipeline", "dummy", "diffusers" (HF diffusers adapter),
567+
# or "diffusers_single_file" (HF diffusers adapter via from_single_file).
529568
diffusion_load_format: str = "default"
530569

531570
# Diffusers adapter kwargs
@@ -777,10 +816,24 @@ def __post_init__(self):
777816
elif self.max_cpu_loras < 1:
778817
raise ValueError("max_cpu_loras must be >= 1 for diffusion LoRA")
779818

780-
if self.diffusion_load_format != "diffusers" and (self.diffusers_load_kwargs or self.diffusers_call_kwargs):
819+
is_single_file = self.diffusion_load_format == "diffusers_single_file" or (
820+
isinstance(self.model, str) and os.path.isfile(self.model)
821+
)
822+
if (
823+
is_single_file
824+
and self.model_class_name not in _NATIVE_SINGLE_FILE_DIFFUSION_MODELS
825+
and self.diffusion_load_format in (None, "default", "diffusers")
826+
):
827+
self.diffusion_load_format = "diffusers_single_file"
828+
829+
if (
830+
self.diffusion_load_format not in ("diffusers", "diffusers_single_file")
831+
and self.model_class_name not in _NATIVE_SINGLE_FILE_DIFFUSION_MODELS
832+
and (self.diffusers_load_kwargs or self.diffusers_call_kwargs)
833+
):
781834
raise ValueError(
782835
"diffusers_load_kwargs and diffusers_call_kwargs are only "
783-
"valid together with diffusion_load_format=diffusers"
836+
"valid together with diffusion_load_format=diffusers or diffusers_single_file"
784837
)
785838

786839
def _propagate_quantization_from_tf_config(self, tf_config: "TransformerConfig") -> None:
@@ -875,6 +928,29 @@ def enrich_config(self) -> None:
875928
if self.model_class_name is None and self.diffusion_load_format == "diffusers":
876929
self.model_class_name = "DiffusersAdapterPipeline"
877930

931+
if (
932+
self.diffusion_load_format == "diffusers_single_file"
933+
or (isinstance(self.model, str) and os.path.isfile(self.model))
934+
) and self.diffusion_load_format in (
935+
None,
936+
"default",
937+
"diffusers",
938+
"diffusers_single_file",
939+
):
940+
if self.model_class_name in _NATIVE_SINGLE_FILE_DIFFUSION_MODELS:
941+
self.diffusion_load_format = "default"
942+
self.set_tf_model_config(TransformerConfig())
943+
return
944+
945+
if self.diffusion_load_format in (None, "default"):
946+
self.diffusion_load_format = "diffusers_single_file"
947+
948+
if self.model_class_name is not None and self.model_class_name != "DiffusersAdapterPipeline":
949+
self.diffusers_pipeline_cls = _resolve_diffusers_pipeline_cls(self.model_class_name)
950+
self.model_class_name = "DiffusersAdapterPipeline"
951+
self.set_tf_model_config(TransformerConfig())
952+
return
953+
878954
try:
879955
config_dict = get_hf_file_to_dict("model_index.json", self.model)
880956
if config_dict is not None:

vllm_omni/diffusion/diffusion_engine.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,10 @@ class _RpcTask:
5353

5454

5555
def supports_multimodal_input(od_config: OmniDiffusionConfig) -> tuple[bool, bool]:
56-
if od_config.diffusion_load_format == "diffusers" and (pipe_cls := od_config.diffusers_pipeline_cls) is not None:
56+
if (
57+
od_config.diffusion_load_format in ("diffusers", "diffusers_single_file")
58+
and (pipe_cls := od_config.diffusers_pipeline_cls) is not None
59+
):
5760
signature = inspect.signature(pipe_cls.__call__)
5861
support_image_input = "image" in signature.parameters
5962
support_audio_input = (

0 commit comments

Comments
 (0)