@@ -149,74 +149,3 @@ batch may still pay compile or CUDA-graph capture cost.
149149
150150For a Qwen-Image continuous-batching replay example, see
151151[ ` performance_dashboard/qwen_image_serving_performance.md ` ] ( ./performance_dashboard/qwen_image_serving_performance.md ) .
152-
153- ## 4. Anima Native Single-File Benchmarking
154-
155- Native Anima is benchmarked as a text-to-image model through the same serving
156- benchmark entrypoint. Unlike standard HuggingFace model IDs, Anima serves the
157- raw single-file transformer checkpoint and loads non-denoiser components from a
158- Diffusers-layout component directory.
159-
160- Download the official Anima checkpoint and components first. The commands below
161- use ` /path/to/models ` as a placeholder; replace it with any local directory that
162- has enough space for the checkpoint and component files.
163-
164- ``` bash
165- mkdir -p /path/to/models/anima-official
166- mkdir -p /path/to/models/anima-components
167-
168- hf download circlestone-labs/Anima \
169- split_files/diffusion_models/anima-base-v1.0.safetensors \
170- --local-dir /path/to/models/anima-official
171-
172- hf download circlestone-labs/Anima-Base-v1.0-Diffusers \
173- --local-dir /path/to/models/anima-components
174-
175- CHECKPOINT=/path/to/models/anima-official/split_files/diffusion_models/anima-base-v1.0.safetensors
176- COMPONENTS=/path/to/models/anima-components
177- ```
178-
179- Run these commands from the vLLM-Omni repository in the Python environment or
180- container where vLLM-Omni is installed.
181-
182- Start the server with the checkpoint as ` --model ` and pass the component
183- directory through ` --diffusers-load-kwargs ` :
184-
185- ``` bash
186- vllm serve " $CHECKPOINT " \
187- --omni \
188- --port 8099 \
189- --model-class-name AnimaPipeline \
190- --diffusers-load-kwargs " {\" components_path\" :\" $COMPONENTS \" }"
191- ```
192-
193- Then run the standard diffusion serving benchmark:
194-
195- ``` bash
196- python3 benchmarks/diffusion/diffusion_benchmark_serving.py \
197- --base-url http://localhost:8099 \
198- --endpoint /v1/chat/completions \
199- --model " $CHECKPOINT " \
200- --task t2i \
201- --dataset random \
202- --num-prompts 10 \
203- --max-concurrency 1 \
204- --warmup-requests 1 \
205- --warmup-concurrency 1 \
206- --width 1024 \
207- --height 1024 \
208- --num-inference-steps 50
209- ```
210-
211- This matches the Diffusers baseline defaults for Anima: 1024x1024, 50 denoising
212- steps, ` max_sequence_length=512 ` , one image per prompt, empty negative prompt,
213- and CFG scale 4.0 from the default guider. Do not pass ` guidance_scale ` through
214- the benchmark unless you are intentionally measuring a non-default CFG setting.
215-
216- Native Anima currently supports baseline single-GPU execution. Cache-DiT,
217- TeaCache, CPU offload, layer-wise offload, quantization, TP/SP, CFG parallel,
218- HSDP, and step execution are not supported by ` AnimaPipeline ` yet.
219-
220- Anima uses the default single diffusion stage for local single-file checkpoint
221- discovery when ` --model-class-name AnimaPipeline ` is provided; no deploy config
222- is required.
0 commit comments