fix: allow draft models to override context length by JackZeng0208 · Pull Request #18234 · sgl-project/sglang

JackZeng0208 · 2026-02-04T07:31:45Z

Motivation

See: #18220
Purpose: allow draft models to override context length

The speculative decoding documentation CI was failing because draft models with smaller context lengths (ex: EAGLE with 2048) cannot use target model's context length (ex: like 8192) without setting environment variables.

Modifications

This fix adds is_draft_model to the bypass condition in _derive_context_length() in sglang/srt/configs/model_config.py, allowing draft models to automatically override context length. The fix will solve the problem fundamentally (not just for the doc itself). This fix is safe because code already handles draft models safely right after that(https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/configs/model_config.py#L367)

Comparing with PR #18225 (PR #18226) and PR #18228

For PR fix: add SGLANG_IS_IN_CI env var to release-docs workflow #18225: changing SGLANG_IS_IN_CI will affect other codebase significantly
For PR Docs: allow context override for speculative notebooks #18228: only works for doc using doc_patch.py
Basically, both PRs are trying to bypass validation

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-04T07:31:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

mmangkad · 2026-02-04T11:56:26Z

Hi @JackZeng0208, is_draft_model shouldn’t be made an unconditional bypass. PR #9388 intentionally made this opt‑in because forcing longer context on shorter drafts can trigger IMA. #10787 only makes the override safe when explicitly enabled (it bumps max_position_embeddings), it doesn’t justify always‑on behavior. This effectively undoes #9388’s safety guard for draft models.

Concrete example: target meta-llama/Meta-Llama-3-8B-Instruct (8192) + draft lmsys/sglang-EAGLE-LLaMA3-Instruct-8B (2048). If we always bypass, the draft gets forced to 8192 and its rope cache/attention metadata are still sized for 2048, which is exactly how you hit device‑side asserts (scatter/gather index out of bounds) as described in #9388.

For #18220 the right fix is docs‑scoped opt‑in (SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 during docs runs) so we keep the global guard intact.

The speculative decoding notebook was failing in documentation CI because the SGLANG_IS_IN_CI environment variable was not set. Fixes #18220 Co-authored-by: Yixiao Zeng <yixiaozeng0208@outlook.com>

JackZeng0208 mentioned this pull request Feb 4, 2026

[Bug] Speculative Decoding Document failed. #18220

Closed

2 tasks

JackZeng0208 force-pushed the fix-speculative-decoding-context-length branch from 34d0311 to 974e7bf Compare February 5, 2026 01:01

fix: add SGLANG_IS_IN_CI env var to notebook workflows

307e98c

The speculative decoding notebook was failing in documentation CI because the SGLANG_IS_IN_CI environment variable was not set. Fixes #18220 Co-authored-by: Yixiao Zeng <yixiaozeng0208@outlook.com>

JackZeng0208 force-pushed the fix-speculative-decoding-context-length branch from 974e7bf to 307e98c Compare February 5, 2026 01:11

JackZeng0208 closed this by deleting the head repository Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow draft models to override context length#18234

fix: allow draft models to override context length#18234
JackZeng0208 wants to merge 1 commit intosgl-project:mainfrom
JackZeng0208:fix-speculative-decoding-context-length

JackZeng0208 commented Feb 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Uh oh!

mmangkad commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JackZeng0208 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Uh oh!

mmangkad commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JackZeng0208 commented Feb 4, 2026 •

edited

Loading