Skip to content

verify_done: wait not synchronize#25465

Merged
hnyls2002 merged 3 commits into
mainfrom
lsyin/r3-pr-b
May 19, 2026
Merged

verify_done: wait not synchronize#25465
hnyls2002 merged 3 commits into
mainfrom
lsyin/r3-pr-b

Conversation

@hnyls2002

@hnyls2002 hnyls2002 commented May 16, 2026

Copy link
Copy Markdown
Collaborator

Use event.wait() (stream-level) instead of .synchronize() (CPU block) in maybe_wait_verify_done. Schedule-stream prep ops following the wait are ordered after the forward-stream verify via the stream wait; CPU is no longer blocked. Subsequent .cpu() / .item() calls naturally sync the stream.


CI States

Latest PR Test (Base): ⏳ Run #26097243050
Latest PR Test (Extra): ❌ Run #26097242813

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the blackwell SM100/SM120 label May 16, 2026
@hnyls2002 hnyls2002 changed the title Spec V2 overlap: remove verify_done sync; defer CPU; preallocate trtllm_mla custom_mask refactor: minor scheduling cleanup May 19, 2026
@hnyls2002 hnyls2002 requested a review from ByronHsu as a code owner May 19, 2026 01:37
@hnyls2002

hnyls2002 commented May 19, 2026

Copy link
Copy Markdown
Collaborator Author

/rerun-test test_dsa_dsv32_tp_mtp.py

(2 tries)

@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🚀 8-gpu-h200 (1 test): ❌ View workflow run

cd test/ && python3 registered/dsa_models_e2e/test_dsa_dsv32_tp_mtp.py

@hnyls2002 hnyls2002 changed the title refactor: minor scheduling cleanup spec v2: maybe_wait_verify_done uses event.wait not synchronize May 19, 2026
@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🚀 8-gpu-h200 (1 test): ✅ View workflow run

cd test/ && python3 registered/dsa_models_e2e/test_dsa_dsv32_tp_mtp.py

@hnyls2002 hnyls2002 changed the title spec v2: maybe_wait_verify_done uses event.wait not synchronize verify_done: wait not synchronize May 19, 2026
@hnyls2002

Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci extra

@hnyls2002 hnyls2002 merged commit 16bcc45 into main May 19, 2026
178 of 191 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/r3-pr-b branch May 19, 2026 21:57
@hnyls2002

Copy link
Copy Markdown
Collaborator Author

The test_lora_qwen3_8b_logprob_diff.py failure on this branch is a pre-existing cutlass install issue, not caused by this PR. Verified by dispatching the same test on latest main (which contains the cutlass install fix #25756): https://github.com/sgl-project/sglang/actions/runs/26127841829 — passed. Need to merge latest main into this branch to pick up the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant