Skip to content

torch.ops.Optimus.fwd is missing on Python 3.11 / Torch 2.9.1+cu128 #62

Description

@jansenLiang

I am trying to run SFT training for Step-Audio on a local 4-GPU machine, but the Optimus attention op is not available in my environment. Because of that, the model falls back from torch.ops.Optimus.fwd to the regular attention path, and training eventually runs out of memory.

Python: 3.11.14
PyTorch: 2.9.1+cu128
CUDA (torch compiled): 12.8
GPU: NVIDIA GeForce RTX 3090
torch.ops has Optimus namespace: True
torch.ops.Optimus.fwd exists: False

Observed behavior

At startup, the training logs show:

WARNING:main:Optimus op is unavailable. Searched: /.../pretrained_models/Step-Audio-EditX/lib, /.../pretrained_models/Step-Audio-EditX

During training, the model tries to call torch.ops.Optimus.fwd and fails with:

[rank3]: File ".../modeling_step1.py", line 82, in flash_attn_func
[rank3]: return torch.ops.Optimus.fwd(q, k, v, None, dropout_p, softmax_scale, causal, return_attn_probs, None, tp_group_rank, tp_group_size)[0]
[rank3]: ^^^^^^^^^^^^^^^^^^^^^
[rank3]: File ".../torch/_ops.py", line 1365, in getattr
[rank3]: raise AttributeError(
[rank3]: AttributeError: '_OpNamespace' 'Optimus' object has no attribute 'fwd'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions