[Model] Add circlestone-labs/Anima by akshatvishu · Pull Request #4083 · vllm-project/vllm-omni

akshatvishu · 2026-06-02T22:25:25Z

Resolves #3658

Adds native diffusion support for circlestone-labs/Anima, a Cosmos-style text-to-image model distributed as a single-file safetensors checkpoint. The new AnimaPipeline loads the transformer/text-conditioner weights directly from the checkpoint, converts original Cosmos-style transformer keys when needed, and loads non-denoiser components such as the text encoder, tokenizers, VAE and scheduler from a Diffusers-layout components directory.

Native Anima currently targets baseline single-GPU execution. TP, SP, CFG-parallel, HSDP, Cache-DiT/TeaCache, quantization, CPU/layerwise offload and step execution are not supported yet.

Key Changes

Native Anima Pipeline
- Adds vllm_omni/diffusion/models/anima/ with AnimaPipeline,
  AnimaTransformer3DModel, and AnimaTextConditioner.
- Supports direct local single-file safetensors loading and strict native
  module weight loading.
- Implements prompt encoding, true CFG handling, denoising, VAE decode, and
  Anima-specific post-processing.
Single-file Diffusion Loading
- Adds diffusers_single_file handling for Diffusers adapter pipelines.
- Auto-detects local .safetensors/.ckpt single-file checkpoints.
- Maps Anima single-file aliases, including AnimaModularPipeline, to the
  native AnimaPipeline.
- Allows default single-stage config selection for local single-file Anima
  serving without requiring a deploy config.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e64bd1902

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T22:30:32Z

+            pipeline.enable_vae_tiling()
+
+        self._pipeline = pipeline
+        self._accept_call_kwargs = set(inspect.signature(pipeline.__call__).parameters.keys())


Preserve ModularPipeline runtime kwargs

When the native Anima path is used, pipeline is a Diffusers ModularPipeline, whose __call__ signature is generic (state, output, **kwargs) rather than listing model inputs like prompt, height, or num_inference_steps. Caching that signature here makes _build_call_kwargs() later reject and drop the actual request fields, so a normal text-to-image request reaches the modular blocks without the required prompt and fails before generation. For modular pipelines this needs to allow block input names (or accept all kwargs) instead of using inspect.signature(pipeline.__call__) directly.

Useful? React with 👍 / 👎.

akshatvishu · 2026-06-04T21:41:07Z

Doing a major refactoring , not ready for review yet!

timzsu · 2026-06-13T08:23:00Z

Hi @akshatvishu, may I ask when this PR will be ready?

akshatvishu · 2026-06-13T11:57:54Z

@timzsu It's ready review!

The benchmarking code is included temporarily for validating this port. Once we're happy with the implementation, I'll run the benchmarks against the native diffuser implementation and remove the benchmarking code afterward before merging!

timzsu · 2026-06-13T12:14:08Z

Hi @akshatvishu, is it possible to split the performance optimizations from the model support? The current PR is too big (>3k lines) and hard to review. I suggest keeping the first PR as an integration with no extra optimizations. Then you can create separate PRs for offloading, quantization, and cache based on it.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu · 2026-06-13T13:02:09Z

@timzsu Done. I’ve split the performance optimizations out of this PR. It now contains only the baseline model integration! All the optimization : the offloading, quantization and cache-related changes have been removed and will be raised separately via follow-up PRs to this!

I’ve also squashed the remaining changes into a single commit.

akshatvishu · 2026-06-13T13:33:18Z

Baseline validation

The baseline run completed successfully on a single MI300x (ROCm via official docker image) with the following configuration:
Prompt:
official art, 2girls, hatsune miku, kasane teto, metal gear (series), @ shinkawa youji, twintails, blue hair, drill hair, red hair, fighting stance, kneeling, aiming, handgun, holding gun, suppressor, sneaking suit, profile, projected inset

Negative prompt:
worst quality, low quality, score_1, score_2, score_3, artist name

BF16
50 inference steps
Configured image size: 1024x1024

Results

Engine initialization: 28.58 s
Model loading and initialization: 13.18 s
End-to-end generation latency: 5.55 s
Diffusion time: 5.24 s
Post-processing time: 24.31 ms
Peak GPU memory: 10.52 GB reserved
Peak allocated GPU memory: 9.50 GB
Throughput: approximately 11.6 steps/s

prompt = (
"masterpiece, best quality, very aesthetic, absurdres, 1girl, solo, silver hair, blue eyes, "
"long hair, school uniform, sailor collar, cherry blossoms, petals, spring, soft lighting, "
"looking at viewer, upper body, detailed background"
)
negative_prompt = (
"worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, "
"sepia, signature, artist name"
)

BF16
25 inference steps
Configured image size: 1024x1024

akshatvishu · 2026-06-19T08:52:39Z

Here is where we currently stand with the Anima integration:

1. Checkpoint Loading & Architecture Alignment

Single-file checkpoints currently bypass stage-config discovery, meaning the caller must explicitly provide --model-class-name and --diffusion-load-format diffusers_single_file (or the equivalent configuration). This is intentional, but it means VLLM-Omni cannot infer the model type from the checkpoint alone.

Since Anima is the first native diffusers_single_file integration in Omni:
Should we keep this as an explicit loading contract? Or should we introduce a registry-based detection path for known single-file models? I’d like to align on the intended long-term design before finalizing this path.

2. Module Sharing & Code Reuse

I reviewed the Cosmos3 integration to see if we could share modules, similar to how Ming-TTS and Ming-Omni align.

While Anima is based on nvidia/Cosmos-Predict2-2B-Text2Image, its transformer structure, text-conditioning path, checkpoint conversion and execution assumptions are not directly compatible with the existing Cosmos3 implementation.

Unlike Ming-TTS and Ming-Omni (which share clear boundaries through common audio components/utilities), I don't see an equivalent reusable boundary for Anima right now. Therefore, I've kept the implementation separate for now and plan to extract smaller utilities only when a concrete second consumer arrives.

Next Steps

Optimization support is already tracked as follow-up work. I’ll run the parity and performance benchmarks as soon as we align on the integration design above.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com> # Conflicts: # vllm_omni/diffusion/diffusion_engine.py # vllm_omni/diffusion/registry.py

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

yuanheng-zhao · 2026-06-20T14:52:18Z

+        else:
+            if hasattr(diffusers, model_class_name):
+                return getattr(diffusers, model_class_name)


This branch seems to be dead code

yuanheng-zhao · 2026-06-20T14:52:37Z

+_NATIVE_SINGLE_FILE_DIFFUSION_MODELS = {"AnimaPipeline"}
+_ANIMA_SINGLE_FILE_ALIASES = {"AnimaPipeline", "AnimaModularPipeline"}
+
+
+def _diffusers_pipeline_module_name(model_class_name):
+    base_name = model_class_name
+    for suffix in ("ModularPipeline", "Pipeline"):
+        if base_name.endswith(suffix):
+            base_name = base_name[: -len(suffix)]
+            break
+    if not base_name:
+        return None
+
+    chars = []
+    for index, char in enumerate(base_name):
+        if char.isupper() and index > 0:
+            chars.append("_")
+        chars.append(char.lower())
+    return "vllm_omni.diffusion.models." + "".join(chars)
+
+
+def _resolve_diffusers_pipeline_cls(model_class_name):
+    if hasattr(diffusers, model_class_name):


Why we need these changes specifically for AnimaPipeline?

Done! Kept only the Anima single-file alias handling; Anima's HF single-file checkpoint uses AnimaModularPipeline but vLLM-Omni loads it through the native AnimaPipeline because the denoiser and text-conditioner weights need custom splitting and key conversion.

yuanheng-zhao · 2026-06-20T14:54:37Z

+
+    def _load_native_denoiser_components(self, state_dict=None):
+        if state_dict is None:
+            import os


redundant import os. And we'd better organize imports (move to top module imports if they're not triggering circular imports or some special cases)

yuanheng-zhao · 2026-06-20T14:56:18Z

+    "extra_pos_embed_type": None,
+}
+
+_COSMOS_2_TRANSFORMER_RENAMES = {


Why we need cosmos 2 rename mapping for loading components of the model?

The raw Anima checkpoint uses original training names for a Cosmos-style denoiser. Renamed this to Anima original-checkpoint conversion and added a short comment for the same!

yuanheng-zhao · 2026-06-20T14:57:46Z

        for key, value in sampling.__dict__.items():
            if value is None:
                continue
+            if key == "guidance_scale" and not getattr(sampling, "guidance_scale_provided", False):


This affects other diffusers-adapter models, not just Anima. Please check if it's necessary, and if any better way to handle within the model-specific scope

Removed! Anima now handles guidance_scale only inside AnimaPipeline

yuanheng-zhao · 2026-06-20T14:58:33Z

+_ANIMA_TRANSFORMER_CONFIG = {
+    "in_channels": 16,
+    "out_channels": 16,
+    "num_attention_heads": 16,
+    "attention_head_dim": 128,
+    "num_layers": 28,
+    "mlp_ratio": 4.0,
+    "text_embed_dim": 1024,
+    "adaln_lora_dim": 256,
+    "max_size": (128, 240, 240),
+    "patch_size": (1, 2, 2),
+    "rope_scale": (1.0, 4.0, 4.0),
+    "concat_padding_mask": True,
+    "extra_pos_embed_type": None,
+}


Check if it's suitable to be migrated into vllm_omni/transformers_utils/configs/

Moved the Anima transformer config next to AnimaTransformer3DModel instead. I did not move it to transformers_utils/configs/ because it is not a HF AutoConfig config.

yuanheng-zhao · 2026-06-20T14:59:51Z

+    def _infer_text_conditioner_config(state_dict):
+        model_dim = state_dict["blocks.0.self_attn.q_proj.weight"].shape[0]
+        source_dim = state_dict["blocks.0.cross_attn.k_proj.weight"].shape[1]
+        target_vocab_size, target_dim = state_dict["embed.weight"].shape
+        attention_head_dim = state_dict["blocks.0.self_attn.q_norm.weight"].shape[0]
+        num_layers = 1 + max(int(key.split(".")[1]) for key in state_dict if key.startswith("blocks."))
+
+        return {
+            "source_dim": source_dim,
+            "target_dim": target_dim,
+            "model_dim": model_dim,
+            "num_layers": num_layers,
+            "num_attention_heads": model_dim // attention_head_dim,
+            "target_vocab_size": target_vocab_size,
+        }


Can we utilize unified / consistent configs rather than inferring? (for now we have both ways, sort of inconsistent)

Removed! The text-conditioner now uses a fixed ANIMA_TEXT_CONDITIONER_CONFIG next to its component class.

yuanheng-zhao · 2026-06-20T15:01:07Z

+        self.vae_scale_factor = (
+            2 ** len(self.vae.temperal_downsample) if hasattr(self.vae, "temperal_downsample") else 8
+        )


In get_anima_post_process_func we have hardcoded vae_scale_factor = 8. Consider revising to a consistent way of assignment

Hardcoded ANIMA_VAE_SCALE_FACTOR = 8 to match the postprocess API, which lacks VAE access. The runtime still uses the loaded VAE's scale factor if available, falling back to 8.

Address Anima review feedback by removing dead Diffusers class resolution code, keeping native Anima single-file routing explicit, moving Anima component configs next to their model classes and making VAE scale-factor handling consistent between postprocess and runtime. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…_emb Uses diffusers' apply_rotary_emb to upcast RoPE calculations to float32, resolving the bfloat16 numerical drift vs the reference pipeline. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu requested review from Gaohan123, Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, hsliuustc0106, princepride, tzhouam, wtomin and yenuo26 as code owners June 2, 2026 22:25

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

akshatvishu force-pushed the anima branch from 1e64bd1 to 77d77f4 Compare June 2, 2026 22:33

akshatvishu requested review from yuanheng-zhao and ywang96 as code owners June 4, 2026 22:02

akshatvishu requested review from ZeldaHuang, linyueqian and lishunyang12 as code owners June 5, 2026 12:33

akshatvishu force-pushed the anima branch from 1fda9a7 to e6cde7b Compare June 5, 2026 19:34

akshatvishu force-pushed the anima branch from 403b9fa to 17d6815 Compare June 13, 2026 09:54

akshatvishu requested review from congw729 and gcanlin as code owners June 13, 2026 09:54

akshatvishu force-pushed the anima branch from 17d6815 to f6cfe9d Compare June 13, 2026 09:59

akshatvishu changed the title ~~[WIP] Add circlestone-labs/Anima~~ [Model] Add circlestone-labs/Anima Jun 13, 2026

Add Anima Pipeline

223e441

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu force-pushed the anima branch from 164f3b5 to 223e441 Compare June 13, 2026 12:58

akshatvishu added 2 commits June 19, 2026 14:26

Merge remote-tracking branch 'upstream/main' into anima

4757705

Signed-off-by: akshatvishu <akshatnayak197@gmail.com> # Conflicts: # vllm_omni/diffusion/diffusion_engine.py # vllm_omni/diffusion/registry.py

[Recipe] Add Anima recipe

7aa9f62

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

yuanheng-zhao reviewed Jun 20, 2026

View reviewed changes

akshatvishu added 3 commits June 20, 2026 21:26

[Diffusion] Simplify Anima VAE scale handling

ec5817f

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Merge remote-tracking branch 'upstream/main' into anima

88b340b

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu force-pushed the anima branch from 18fcada to 3002a5a Compare June 22, 2026 13:07

akshatvishu added 4 commits June 23, 2026 01:12

[Diffusion] Run Anima denoising steps in float32 precision

0410a8f

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

[Diffusion] Match Anima text conditioner mask handling

e9da79d

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

[Diffusion] Restore Anima text conditioner mask contract

5bf86ba

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

[Diffusion] Preserve Anima scheduler dtype

98379bc

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu force-pushed the anima branch from e474cc2 to f32de2e Compare June 22, 2026 19:42

[Diffusion] Resolve Anima precision drift via diffusers' apply_rotary…

9c638d1

…_emb Uses diffusers' apply_rotary_emb to upcast RoPE calculations to float32, resolving the bfloat16 numerical drift vs the reference pipeline. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu force-pushed the anima branch from f32de2e to 9c638d1 Compare June 22, 2026 19:53

Conversation

akshatvishu commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

akshatvishu commented Jun 4, 2026

Uh oh!

timzsu commented Jun 13, 2026

Uh oh!

akshatvishu commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timzsu commented Jun 13, 2026

Uh oh!

akshatvishu commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshatvishu commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Baseline validation

Results

Uh oh!

akshatvishu commented Jun 19, 2026

1. Checkpoint Loading & Architecture Alignment

2. Module Sharing & Code Reuse

Next Steps

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akshatvishu Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akshatvishu Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akshatvishu Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akshatvishu Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

akshatvishu commented Jun 2, 2026 •

edited

Loading

akshatvishu commented Jun 13, 2026 •

edited

Loading

akshatvishu commented Jun 13, 2026 •

edited

Loading

akshatvishu commented Jun 13, 2026 •

edited

Loading

akshatvishu Jun 20, 2026 •

edited

Loading

akshatvishu Jun 20, 2026 •

edited

Loading

akshatvishu Jun 20, 2026 •

edited

Loading

akshatvishu Jun 20, 2026 •

edited

Loading