Commit f5c9dfb
committed
feat(moe/gluon): warp-decode MoE for gfx950 small-M decode
Squashed warp-decode work (coop-LDS stage1 + per-M split-K stage2, interleave/
K-tail/scale fixes) for rebase onto the PR lightseekorg#374 MoE-API refactor. Full
per-commit history preserved in backup/gptoss-warp-decode-moe-* .
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>1 parent 38b3a35 commit f5c9dfb
1 file changed
Lines changed: 524 additions & 208 deletions
0 commit comments