Skip to content

revert: remove instanttensor loader#462

Merged
torchspec-bot merged 2 commits into
mainfrom
bot/revert-instanttensor-loader
Jun 16, 2026
Merged

revert: remove instanttensor loader#462
torchspec-bot merged 2 commits into
mainfrom
bot/revert-instanttensor-loader

Conversation

@lightseek-bot

Copy link
Copy Markdown
Contributor

Summary

  • Revert d3b285b
  • Remove the InstantTensor load format, docs, CI args, and runtime tests added by that commit

Tests

  • .venv/bin/python -m compileall -q python/tokenspeed
  • PRE_COMMIT_HOME=/tmp/pre-commit pre-commit run --all-files

Signed-off-by: lightseek-bot <243258330+lightseek-bot@users.noreply.github.com>
@lightseek-bot lightseek-bot requested a review from a team as a code owner June 16, 2026 08:17

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3863362906

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

normal_weights.append((name, weight))
for name, weight in weights:
if ".experts" in name:
mxfp4_weights.append((name, weight))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep GPT-OSS MXFP4 expert loading streamed

Here weights is the loader generator, but this loop now appends every .experts tensor before any copy into model parameters occurs. For MXFP4 checkpoints such as openai/gpt-oss-120b, those expert tensors dominate the checkpoint; with the default safetensors iterator this pins all loaded shards in host memory until the list is handed to _load_mxfp4_experts_weights, so large-model loads can OOM instead of streaming each expert into its preallocated slot.

Useful? React with 👍 / 👎.

vision_weights.append((name, loaded_weight))
else:
name = name.replace("language_model.", "")
language_weights.append((name, loaded_weight))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stream Kimi language weights into the LM

Appending every non-vision tensor means Kimi language weights are all materialized before DeepseekV3ForCausalLM.load_weights is invoked. For large checkpoints such as nvidia/Kimi-K2.5-NVFP4 (still used by the CI configs changed here), the safetensors iterator will keep the whole language checkpoint alive in host RAM rather than freeing shards as DeepSeek consumes them, which can OOM large Kimi loads; pass a generator to language_model.load_weights and only collect vision weights.

Useful? React with 👍 / 👎.

@torchspec-bot torchspec-bot merged commit ec60f40 into main Jun 16, 2026
69 of 73 checks passed
@torchspec-bot torchspec-bot deleted the bot/revert-instanttensor-loader branch June 16, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants