I am trying to run SFT training for Step-Audio on a local 4-GPU machine, but the Optimus attention op is not available in my environment. Because of that, the model falls back from torch.ops.Optimus.fwd to the regular attention path, and training eventually runs out of memory.
Python: 3.11.14
PyTorch: 2.9.1+cu128
CUDA (torch compiled): 12.8
GPU: NVIDIA GeForce RTX 3090
torch.ops has Optimus namespace: True
torch.ops.Optimus.fwd exists: False
Observed behavior
At startup, the training logs show:
WARNING:main:Optimus op is unavailable. Searched: /.../pretrained_models/Step-Audio-EditX/lib, /.../pretrained_models/Step-Audio-EditX
During training, the model tries to call torch.ops.Optimus.fwd and fails with:
[rank3]: File ".../modeling_step1.py", line 82, in flash_attn_func
[rank3]: return torch.ops.Optimus.fwd(q, k, v, None, dropout_p, softmax_scale, causal, return_attn_probs, None, tp_group_rank, tp_group_size)[0]
[rank3]: ^^^^^^^^^^^^^^^^^^^^^
[rank3]: File ".../torch/_ops.py", line 1365, in getattr
[rank3]: raise AttributeError(
[rank3]: AttributeError: '_OpNamespace' 'Optimus' object has no attribute 'fwd'
I am trying to run SFT training for Step-Audio on a local 4-GPU machine, but the Optimus attention op is not available in my environment. Because of that, the model falls back from torch.ops.Optimus.fwd to the regular attention path, and training eventually runs out of memory.
Observed behavior
At startup, the training logs show:
During training, the model tries to call torch.ops.Optimus.fwd and fails with: