revert: remove instanttensor loader#462
Conversation
Signed-off-by: lightseek-bot <243258330+lightseek-bot@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3863362906
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| normal_weights.append((name, weight)) | ||
| for name, weight in weights: | ||
| if ".experts" in name: | ||
| mxfp4_weights.append((name, weight)) |
There was a problem hiding this comment.
Keep GPT-OSS MXFP4 expert loading streamed
Here weights is the loader generator, but this loop now appends every .experts tensor before any copy into model parameters occurs. For MXFP4 checkpoints such as openai/gpt-oss-120b, those expert tensors dominate the checkpoint; with the default safetensors iterator this pins all loaded shards in host memory until the list is handed to _load_mxfp4_experts_weights, so large-model loads can OOM instead of streaming each expert into its preallocated slot.
Useful? React with 👍 / 👎.
| vision_weights.append((name, loaded_weight)) | ||
| else: | ||
| name = name.replace("language_model.", "") | ||
| language_weights.append((name, loaded_weight)) |
There was a problem hiding this comment.
Stream Kimi language weights into the LM
Appending every non-vision tensor means Kimi language weights are all materialized before DeepseekV3ForCausalLM.load_weights is invoked. For large checkpoints such as nvidia/Kimi-K2.5-NVFP4 (still used by the CI configs changed here), the safetensors iterator will keep the whole language checkpoint alive in host RAM rather than freeing shards as DeepSeek consumes them, which can OOM large Kimi loads; pass a generator to language_model.load_weights and only collect vision weights.
Useful? React with 👍 / 👎.
Summary
Tests