[Model] Add circlestone-labs/Anima#4083
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e64bd1902
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| pipeline.enable_vae_tiling() | ||
|
|
||
| self._pipeline = pipeline | ||
| self._accept_call_kwargs = set(inspect.signature(pipeline.__call__).parameters.keys()) |
There was a problem hiding this comment.
Preserve ModularPipeline runtime kwargs
When the native Anima path is used, pipeline is a Diffusers ModularPipeline, whose __call__ signature is generic (state, output, **kwargs) rather than listing model inputs like prompt, height, or num_inference_steps. Caching that signature here makes _build_call_kwargs() later reject and drop the actual request fields, so a normal text-to-image request reaches the modular blocks without the required prompt and fails before generation. For modular pipelines this needs to allow block input names (or accept all kwargs) instead of using inspect.signature(pipeline.__call__) directly.
Useful? React with 👍 / 👎.
|
Doing a major refactoring , not ready for review yet! |
|
Hi @akshatvishu, may I ask when this PR will be ready? |
|
@timzsu It's ready review! The benchmarking code is included temporarily for validating this port. Once we're happy with the implementation, I'll run the benchmarks against the native diffuser implementation and remove the benchmarking code afterward before merging! |
|
Hi @akshatvishu, is it possible to split the performance optimizations from the model support? The current PR is too big (>3k lines) and hard to review. I suggest keeping the first PR as an integration with no extra optimizations. Then you can create separate PRs for offloading, quantization, and cache based on it. |
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
|
@timzsu Done. I’ve split the performance optimizations out of this PR. It now contains only the baseline model integration! All the optimization : the offloading, quantization and cache-related changes have been removed and will be raised separately via follow-up PRs to this! I’ve also squashed the remaining changes into a single commit. |
|
Here is where we currently stand with the Anima integration: 1. Checkpoint Loading & Architecture AlignmentSingle-file checkpoints currently bypass stage-config discovery, meaning the caller must explicitly provide Since Anima is the first native 2. Module Sharing & Code ReuseI reviewed the Cosmos3 integration to see if we could share modules, similar to how Ming-TTS and Ming-Omni align. While Anima is based on Unlike Ming-TTS and Ming-Omni (which share clear boundaries through common audio components/utilities), I don't see an equivalent reusable boundary for Anima right now. Therefore, I've kept the implementation separate for now and plan to extract smaller utilities only when a concrete second consumer arrives. Next StepsOptimization support is already tracked as follow-up work. I’ll run the parity and performance benchmarks as soon as we align on the integration design above. |
Signed-off-by: akshatvishu <akshatnayak197@gmail.com> # Conflicts: # vllm_omni/diffusion/diffusion_engine.py # vllm_omni/diffusion/registry.py
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
| else: | ||
| if hasattr(diffusers, model_class_name): | ||
| return getattr(diffusers, model_class_name) |
There was a problem hiding this comment.
This branch seems to be dead code
| _NATIVE_SINGLE_FILE_DIFFUSION_MODELS = {"AnimaPipeline"} | ||
| _ANIMA_SINGLE_FILE_ALIASES = {"AnimaPipeline", "AnimaModularPipeline"} | ||
|
|
||
|
|
||
| def _diffusers_pipeline_module_name(model_class_name): | ||
| base_name = model_class_name | ||
| for suffix in ("ModularPipeline", "Pipeline"): | ||
| if base_name.endswith(suffix): | ||
| base_name = base_name[: -len(suffix)] | ||
| break | ||
| if not base_name: | ||
| return None | ||
|
|
||
| chars = [] | ||
| for index, char in enumerate(base_name): | ||
| if char.isupper() and index > 0: | ||
| chars.append("_") | ||
| chars.append(char.lower()) | ||
| return "vllm_omni.diffusion.models." + "".join(chars) | ||
|
|
||
|
|
||
| def _resolve_diffusers_pipeline_cls(model_class_name): | ||
| if hasattr(diffusers, model_class_name): |
There was a problem hiding this comment.
Why we need these changes specifically for AnimaPipeline?
There was a problem hiding this comment.
Done! Kept only the Anima single-file alias handling; Anima's HF single-file checkpoint uses AnimaModularPipeline but vLLM-Omni loads it through the native AnimaPipeline because the denoiser and text-conditioner weights need custom splitting and key conversion.
|
|
||
| def _load_native_denoiser_components(self, state_dict=None): | ||
| if state_dict is None: | ||
| import os |
There was a problem hiding this comment.
redundant import os. And we'd better organize imports (move to top module imports if they're not triggering circular imports or some special cases)
| "extra_pos_embed_type": None, | ||
| } | ||
|
|
||
| _COSMOS_2_TRANSFORMER_RENAMES = { |
There was a problem hiding this comment.
Why we need cosmos 2 rename mapping for loading components of the model?
There was a problem hiding this comment.
The raw Anima checkpoint uses original training names for a Cosmos-style denoiser. Renamed this to Anima original-checkpoint conversion and added a short comment for the same!
| for key, value in sampling.__dict__.items(): | ||
| if value is None: | ||
| continue | ||
| if key == "guidance_scale" and not getattr(sampling, "guidance_scale_provided", False): |
There was a problem hiding this comment.
This affects other diffusers-adapter models, not just Anima. Please check if it's necessary, and if any better way to handle within the model-specific scope
There was a problem hiding this comment.
Removed! Anima now handles guidance_scale only inside AnimaPipeline
| _ANIMA_TRANSFORMER_CONFIG = { | ||
| "in_channels": 16, | ||
| "out_channels": 16, | ||
| "num_attention_heads": 16, | ||
| "attention_head_dim": 128, | ||
| "num_layers": 28, | ||
| "mlp_ratio": 4.0, | ||
| "text_embed_dim": 1024, | ||
| "adaln_lora_dim": 256, | ||
| "max_size": (128, 240, 240), | ||
| "patch_size": (1, 2, 2), | ||
| "rope_scale": (1.0, 4.0, 4.0), | ||
| "concat_padding_mask": True, | ||
| "extra_pos_embed_type": None, | ||
| } |
There was a problem hiding this comment.
Check if it's suitable to be migrated into vllm_omni/transformers_utils/configs/
There was a problem hiding this comment.
Moved the Anima transformer config next to AnimaTransformer3DModel instead. I did not move it to transformers_utils/configs/ because it is not a HF AutoConfig config.
| def _infer_text_conditioner_config(state_dict): | ||
| model_dim = state_dict["blocks.0.self_attn.q_proj.weight"].shape[0] | ||
| source_dim = state_dict["blocks.0.cross_attn.k_proj.weight"].shape[1] | ||
| target_vocab_size, target_dim = state_dict["embed.weight"].shape | ||
| attention_head_dim = state_dict["blocks.0.self_attn.q_norm.weight"].shape[0] | ||
| num_layers = 1 + max(int(key.split(".")[1]) for key in state_dict if key.startswith("blocks.")) | ||
|
|
||
| return { | ||
| "source_dim": source_dim, | ||
| "target_dim": target_dim, | ||
| "model_dim": model_dim, | ||
| "num_layers": num_layers, | ||
| "num_attention_heads": model_dim // attention_head_dim, | ||
| "target_vocab_size": target_vocab_size, | ||
| } |
There was a problem hiding this comment.
Can we utilize unified / consistent configs rather than inferring? (for now we have both ways, sort of inconsistent)
There was a problem hiding this comment.
Removed! The text-conditioner now uses a fixed ANIMA_TEXT_CONDITIONER_CONFIG next to its component class.
| self.vae_scale_factor = ( | ||
| 2 ** len(self.vae.temperal_downsample) if hasattr(self.vae, "temperal_downsample") else 8 | ||
| ) |
There was a problem hiding this comment.
In get_anima_post_process_func we have hardcoded vae_scale_factor = 8. Consider revising to a consistent way of assignment
There was a problem hiding this comment.
Hardcoded ANIMA_VAE_SCALE_FACTOR = 8 to match the postprocess API, which lacks VAE access. The runtime still uses the loaded VAE's scale factor if available, falling back to 8.
Address Anima review feedback by removing dead Diffusers class resolution code, keeping native Anima single-file routing explicit, moving Anima component configs next to their model classes and making VAE scale-factor handling consistent between postprocess and runtime. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…_emb Uses diffusers' apply_rotary_emb to upcast RoPE calculations to float32, resolving the bfloat16 numerical drift vs the reference pipeline. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>


Resolves #3658
Adds native diffusion support for
circlestone-labs/Anima, a Cosmos-style text-to-image model distributed as a single-file safetensors checkpoint. The newAnimaPipelineloads the transformer/text-conditioner weights directly from the checkpoint, converts original Cosmos-style transformer keys when needed, and loads non-denoiser components such as the text encoder, tokenizers, VAE and scheduler from a Diffusers-layout components directory.Native Anima currently targets baseline single-GPU execution. TP, SP, CFG-parallel, HSDP, Cache-DiT/TeaCache, quantization, CPU/layerwise offload and step execution are not supported yet.
Key Changes
Native Anima Pipeline
vllm_omni/diffusion/models/anima/withAnimaPipeline,AnimaTransformer3DModel, andAnimaTextConditioner.module weight loading.
Anima-specific post-processing.
Single-file Diffusion Loading
diffusers_single_filehandling for Diffusers adapter pipelines..safetensors/.ckptsingle-file checkpoints.AnimaModularPipeline, to thenative
AnimaPipeline.serving without requiring a deploy config.