[spec decoding] Re-enable EAGLE topk==1 argmax fastpath on ROCm#26633
Open
michaelzhang-ai wants to merge 2 commits into
Open
[spec decoding] Re-enable EAGLE topk==1 argmax fastpath on ROCm#26633michaelzhang-ai wants to merge 2 commits into
michaelzhang-ai wants to merge 2 commits into
Conversation
Experiment: removes the `not _is_hip` gate added in #26397 so the topk==1 argmax fastpath also runs on ROCm/HIP. Rationale: R108 (DSv3.2-MTP gsm8k → 0.035) only ever reproduced inside the full AMD nightly suite. In isolated single-job dispatch the un-gated code passes consistently (0.965-0.970), same as the gated/softmax path (0.975) — so the gate is not load-bearing and the failure looks environment-driven rather than caused by the argmax fastpath. This branch re-enables ROCm to test that hypothesis under full-suite conditions. Removes the now-unused is_hip import + _is_hip binding from eagle_draft_extend_cuda_graph_runner.py; keeps _is_hip in eagle_worker_v2.py (still used independently at line 317).
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Removes the
not _is_hipgate added in #26397, so thetopk == 1argmax fastpath (skip full-vocab softmax +fast_topk) runs on ROCm/HIP as well as CUDA.The gate was added defensively when R108 (DSv3.2-MTP gsm8k → 0.040 with ~96% invalid output) appeared on the 2026-05-25 rocm720 nightly. Subsequent investigation indicates the gate is not load-bearing and R108 was a single transient blip, not a deterministic regression from this optimization.
Evidence the gate isn't needed
R108 fired exactly once and self-recovered on gated main (
nightly-accuracy-8-gpu-mi35x-deepseek-v32-mtp-rocm720):The un-gated code passes in isolation — same code path and aiter pin that scored 0.040 in the full suite scored 0.965–0.970 in isolated single-job dispatch:
b13d3d18c62ff4af990The only variable that flips the result is full-suite vs isolated dispatch — not the gate — which points to an environmental cause (cumulative runner state across back-to-back 8-GPU jobs), not the argmax fastpath.
Modifications
Removes
and not _is_hipat the 3 EAGLE draft sites (restoring #26235's original CUDA+ROCm behavior), and drops the now-unusedis_hipimport /_is_hipbinding fromeagle_draft_extend_cuda_graph_runner.py._is_hipis retained ineagle_worker_v2.py(still used independently at line 317).python/sglang/srt/speculative/eagle_draft_extend_cuda_graph_runner.pypython/sglang/srt/speculative/eagle_worker_v2.pyNet: +3 / −14.
Validation (this is a draft pending AMD nightly)
Dispatching the full
nightly-test-amd-rocm720suite on this branch to confirm the un-gated path holds at ~0.96 under full-suite load (the only context R108 ever appeared). Will mark ready once the AMD MTP nightly is green on this branch. If it reproduces 0.040, I'll close this and keep the gate.References
cc @Qiaolin-Yu
CI States
Latest PR Test (Base): ❌ Run #26612546618
Latest PR Test (Extra): ❌ Run #26612548549