Skip to content

fix: allow draft models to override context length#18234

Closed
JackZeng0208 wants to merge 1 commit intosgl-project:mainfrom
JackZeng0208:fix-speculative-decoding-context-length
Closed

fix: allow draft models to override context length#18234
JackZeng0208 wants to merge 1 commit intosgl-project:mainfrom
JackZeng0208:fix-speculative-decoding-context-length

Conversation

@JackZeng0208
Copy link
Contributor

@JackZeng0208 JackZeng0208 commented Feb 4, 2026

Motivation

See: #18220
Purpose: allow draft models to override context length

The speculative decoding documentation CI was failing because draft models with smaller context lengths (ex: EAGLE with 2048) cannot use target model's context length (ex: like 8192) without setting environment variables.

Modifications

This fix adds is_draft_model to the bypass condition in _derive_context_length() in sglang/srt/configs/model_config.py, allowing draft models to automatically override context length. The fix will solve the problem fundamentally (not just for the doc itself). This fix is safe because code already handles draft models safely right after that(https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/configs/model_config.py#L367)

Comparing with PR #18225 (PR #18226) and PR #18228

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@mmangkad
Copy link
Contributor

mmangkad commented Feb 4, 2026

Hi @JackZeng0208, is_draft_model shouldn’t be made an unconditional bypass. PR #9388 intentionally made this opt‑in because forcing longer context on shorter drafts can trigger IMA. #10787 only makes the override safe when explicitly enabled (it bumps max_position_embeddings), it doesn’t justify always‑on behavior. This effectively undoes #9388’s safety guard for draft models.

Concrete example: target meta-llama/Meta-Llama-3-8B-Instruct (8192) + draft lmsys/sglang-EAGLE-LLaMA3-Instruct-8B (2048). If we always bypass, the draft gets forced to 8192 and its rope cache/attention metadata are still sized for 2048, which is exactly how you hit device‑side asserts (scatter/gather index out of bounds) as described in #9388.

For #18220 the right fix is docs‑scoped opt‑in (SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 during docs runs) so we keep the global guard intact.

@JackZeng0208 JackZeng0208 force-pushed the fix-speculative-decoding-context-length branch from 34d0311 to 974e7bf Compare February 5, 2026 01:01
The speculative decoding notebook was failing in documentation CI because the SGLANG_IS_IN_CI environment variable was not set.

Fixes #18220

Co-authored-by: Yixiao Zeng <yixiaozeng0208@outlook.com>
@JackZeng0208 JackZeng0208 force-pushed the fix-speculative-decoding-context-length branch from 974e7bf to 307e98c Compare February 5, 2026 01:11
@JackZeng0208 JackZeng0208 closed this by deleting the head repository Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants