Skip to content

[NSA] Fall back to fast_hadamard_transform when sgl_kernel lacks the symbol#23699

Merged
Fridge003 merged 1 commit into
deepseek_v4from
baizhou-fix-hadamard-transform-fallback
Apr 25, 2026
Merged

[NSA] Fall back to fast_hadamard_transform when sgl_kernel lacks the symbol#23699
Fridge003 merged 1 commit into
deepseek_v4from
baizhou-fix-hadamard-transform-fallback

Conversation

@Fridge003
Copy link
Copy Markdown
Collaborator

Summary

  • On non-HIP / non-SM103 paths, nsa_indexer.rotate_activation does from sgl_kernel import hadamard_transform. Older sgl_kernel builds (e.g. 0.3.21) don't export this symbol, which causes ImportError at forward time and crashes the scheduler. Wrap the import in try/except and fall back to fast_hadamard_transform (already used on HIP / SM103), so older / partial sgl_kernel builds keep working.

Why

  • Reproducible on the official lmsysorg/sglang:deepseek-v4-grace-blackwell image only when running on non-GB300 CUDA hardware where _is_sm103=False. On GB300 it's masked because _is_sm103=True already routes to fast_hadamard_transform. But any non-Blackwell / non-GB300 CUDA platform hits this:
File ".../sglang/srt/layers/attention/nsa/nsa_indexer.py", line ..., in rotate_activation
    from sgl_kernel import hadamard_transform
ImportError: cannot import name 'hadamard_transform' from 'sgl_kernel'
[..] Received sigquit from a child process. It usually means the child failed.
  • fast_hadamard_transform is already an installed dependency on those code paths, so the fallback adds no new requirement.

Diff

def rotate_activation(x: torch.Tensor) -> torch.Tensor:
    if _is_hip or _is_sm103:
        from fast_hadamard_transform import hadamard_transform
    else:
        try:
            from sgl_kernel import hadamard_transform
        except ImportError:
            from fast_hadamard_transform import hadamard_transform
    ...

Test plan

  • On a GB300 pod with lmsysorg/sglang:deepseek-v4-grace-blackwell (sgl-kernel 0.3.21 — no hadamard_transform): _is_sm103=True path still chosen; gsm8k 20-shot sanity acc 0.949 (Flash low-latency) — unchanged.
  • On the same pod when forcing the non-SM103 branch (older internal image fridge003/sglang:final-gb300 where _is_sm103=False because the source predates SM103 detection): without this patch, the scheduler crashed at first forward; with the equivalent patch applied locally, gsm8k 20-shot acc 0.950 — fix confirmed.
  • CI on deepseek_v4 (let upstream run it).

🤖 Generated with Claude Code

…ks the symbol

Older sgl_kernel builds (e.g. 0.3.21) don't export hadamard_transform.
On non-HIP / non-SM103 hardware the import then raises ImportError at
forward time and crashes the scheduler. fast_hadamard_transform is
already a dependency on those paths, so use it as a fallback when
sgl_kernel is missing the symbol.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Fridge003 Fridge003 merged commit 4bf81c9 into deepseek_v4 Apr 25, 2026
2 checks passed
@Fridge003 Fridge003 deleted the baizhou-fix-hadamard-transform-fallback branch April 25, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant