Skip to content

Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release#18910

Merged
Kangyan-Zhou merged 1 commit intomainfrom
fix/mtp-topk-kl-threshold
Feb 18, 2026
Merged

Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release#18910
Kangyan-Zhou merged 1 commit intomainfrom
fix/mtp-topk-kl-threshold

Conversation

@alisonshao
Copy link
Collaborator

@alisonshao alisonshao commented Feb 16, 2026

Summary

  • Reverts the mamba slot release logic on scheduling failure that was causing non-deterministic KL divergence in Qwen3-Next tests
  • When scheduling fails after init_next_round_input has already performed COW allocation, freeing the mamba pool slot causes states to be reconstructed from the radix cache on rescheduling, introducing numerical differences that manifest as outlier KL values

Root Cause Analysis

CI data confirms the correlation:

  • Before the slot release change: Qwen3-Next KL tests had ~90% pass rate (18/20 H100 runs, Feb 7-13)
  • After the slot release change: pass rate dropped to ~35% (7/20 runs, Feb 13-18)
  • Same commit fd5a45d5c on Feb 15 produced both PASS and FAIL, confirming the test became non-deterministic

The mamba state reconstruction from the radix cache (at the last tracking point) introduces slight numerical differences for some samples, causing outlier KL divergence values (0.12-0.27) that pull the arithmetic mean above the test threshold.

Test plan

  • Verify Qwen3-Next KL tests pass: test_qwen3_next_models_mtp.py, test_qwen3_next_models.py
  • Verify no mamba slot leak regressions under normal workloads

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alisonshao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves persistent test failures by increasing the KL divergence threshold for specific MTP topk tests. The adjustment is necessary because the speculative decoding configuration used in these tests exhibits higher variance, causing the tests to fail against a previously stricter threshold. The change aims to stabilize CI without compromising the integrity of the tests, reflecting a more realistic tolerance for the given decoding strategy.

Highlights

  • KL Divergence Threshold Adjustment: The KL divergence threshold in ACC_THRESHOLDS_MTP for TestQwen3NextMTPTopk has been increased from 0.008 to 0.02.
  • Flaky Test Resolution: This change addresses consistent failures in the test_input_output_logprobs_match_decode_cache_hit test, which was failing due to an average KL divergence of ~0.012-0.013 against the previous lower threshold.
  • Speculative Decoding Variance: The adjustment accounts for the inherently higher variance observed in the topk=4 / 5-step speculative decoding configuration compared to non-topk configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • test/registered/4-gpu-models/test_qwen3_next_models_mtp.py
    • Updated the kl_div threshold within ACC_THRESHOLDS_MTP from 0.008 to 0.02.
    • Added a comment explaining that MTP topk has a higher KL divergence threshold due to speculative decoding variance.
Activity
  • The author identified consistent failures in test_input_output_logprobs_match_decode_cache_hit due to the previous KL divergence threshold.
  • Failure examples from GitHub Actions runs were provided to illustrate the issue.
  • A test plan was outlined to ensure CI passes with the updated threshold.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@alisonshao
Copy link
Collaborator Author

/rerun-stage stage-c-test-4-gpu-h100

@github-actions
Copy link
Contributor

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies).

@github-actions
Copy link
Contributor

🔗 View workflow run

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adjusts the KL divergence threshold for Qwen3 Next MTP top-k tests from 0.008 to 0.02 to address flakiness in CI. The change is justified by the higher variance inherent in multi-step speculative decoding with multiple candidates. My feedback suggests documenting the observed outliers in the code comments for better context and highlights the importance of monitoring these numerical shifts to ensure they do not mask regressions.

}

# MTP has higher KL divergence threshold
# MTP topk has higher KL divergence threshold due to speculative decoding variance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is helpful to document the specific reason for the threshold increase, such as the observed outliers mentioned in the PR description, to assist future maintenance.

Suggested change
# MTP topk has higher KL divergence threshold due to speculative decoding variance
# MTP topk has higher KL divergence threshold due to speculative decoding variance (outliers ~0.1 observed)

# MTP topk has higher KL divergence threshold due to speculative decoding variance
ACC_THRESHOLDS_MTP = {
QWEN3_NEXT_MODEL: {"kl_div": 0.008, "gsm8k": 0.93},
QWEN3_NEXT_MODEL: {"kl_div": 0.02, "gsm8k": 0.93},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Increasing the threshold to 0.02 (8x the baseline) effectively addresses CI flakiness but reduces the sensitivity of the test to subtle numerical regressions. Given that outliers of ~0.1 KL were observed, it is worth verifying if these occur on specific tokens (e.g., rare tokens or long sequences) to ensure the variance is purely numerical and not a logic issue in the tree-based logprob matching.

@alisonshao alisonshao changed the title Fix flaky MTP topk KL divergence test threshold Fix flaky Qwen3-Next KL divergence test thresholds Feb 17, 2026
@alisonshao alisonshao changed the title Fix flaky Qwen3-Next KL divergence test thresholds Revert #17613 Qwen3-Next PCG refactor (KL divergence regression test) Feb 17, 2026
@alisonshao
Copy link
Collaborator Author

/rerun-stage stage-c-test-4-gpu-h100

@github-actions
Copy link
Contributor

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies).

@github-actions
Copy link
Contributor

🔗 View workflow run

@alisonshao alisonshao force-pushed the fix/mtp-topk-kl-threshold branch from 5750e72 to 9ad3a2b Compare February 17, 2026 23:31
@alisonshao alisonshao changed the title Revert #17613 Qwen3-Next PCG refactor (KL divergence regression test) Fix Qwen3-Next MTP KL divergence regression by moving forward timeout back to verify stage Feb 17, 2026
@alisonshao alisonshao changed the title Fix Qwen3-Next MTP KL divergence regression by moving forward timeout back to verify stage Fix Qwen3-Next MTP KL divergence regression Feb 17, 2026
@alisonshao
Copy link
Collaborator Author

/rerun-stage stage-c-test-4-gpu-h100

@github-actions
Copy link
Contributor

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies).

@github-actions
Copy link
Contributor

🔗 View workflow run

@alisonshao
Copy link
Collaborator Author

/rerun-stage stage-c-test-4-gpu-h100

@github-actions
Copy link
Contributor

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies).

@github-actions
Copy link
Contributor

🔗 View workflow run

@alisonshao alisonshao force-pushed the fix/mtp-topk-kl-threshold branch from a441639 to 1d3a1ec Compare February 18, 2026 04:21
@alisonshao alisonshao changed the title Fix Qwen3-Next MTP KL divergence regression Fix flaky MTP topk KL divergence test Feb 18, 2026
@alisonshao
Copy link
Collaborator Author

/rerun-stage stage-c-test-4-gpu-h100

@github-actions
Copy link
Contributor

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies).

@github-actions
Copy link
Contributor

🔗 View workflow run

@alisonshao alisonshao changed the title Fix flaky MTP topk KL divergence test Fix flaky Qwen3-Next KL divergence tests Feb 18, 2026
The mamba slot release on scheduling failure introduced non-deterministic
KL divergence in Qwen3-Next tests. When scheduling fails after
init_next_round_input has already performed COW allocation, freeing the
mamba pool slot causes states to be reconstructed from the radix cache
on rescheduling. This reconstruction from the last tracking point
introduces numerical differences for some samples, manifesting as
outlier KL divergence values (0.12-0.27) that push the arithmetic mean
above the test threshold.

CI data confirms this: before the slot release change, Qwen3-Next KL
tests had ~90% pass rate (18/20 runs). After it, pass rate dropped to
~35% (7/20 runs). The same commit produced both PASS and FAIL on
the same day, confirming the test became non-deterministic.

This reverts the mamba slot release logic while preserving the rest of
the scheduling code.
@alisonshao alisonshao force-pushed the fix/mtp-topk-kl-threshold branch from 52b883f to e0e0301 Compare February 18, 2026 05:51
@alisonshao alisonshao changed the title Fix flaky Qwen3-Next KL divergence tests Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release Feb 18, 2026
@alisonshao
Copy link
Collaborator Author

/rerun-stage stage-c-test-4-gpu-h100

@github-actions
Copy link
Contributor

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies).

@github-actions
Copy link
Contributor

🔗 View workflow run

@Kangyan-Zhou Kangyan-Zhou merged commit e2fccb2 into main Feb 18, 2026
82 of 90 checks passed
@Kangyan-Zhou Kangyan-Zhou deleted the fix/mtp-topk-kl-threshold branch February 18, 2026 23:55
@hnyls2002
Copy link
Collaborator

Qwen3-Next KL tests had ~90% pass rate (18/20 H100 runs, Feb 7-13)

I think that could be regarded as deterministic...

@kuafou
Copy link
Contributor

kuafou commented Feb 19, 2026

Follow-up: the original issue discussed here is now addressed in #19024, which fixes the memory checker directly (without releasing slots or affecting scheduling behavior).

ec-jt added a commit to ec-jt/sglang that referenced this pull request Feb 24, 2026
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants