Qwen3.5 disagg support by YAMY1234 · Pull Request #185 · ishandhanani/srt-slurm

YAMY1234 · 2026-02-23T16:51:45Z

Summary by CodeRabbit

New Features
- Added multiple Qwen3.5 deployment recipes (TP/DEP/TEP/AGG/1P1D variants) with prefill/decode presets, speculative decoding options, and benchmark profiles.
- Introduced an automated rebuild-and-verify script for the DeepEP inference component with system checks and post-build validation.
- Added extensive environment, memory, and scheduling presets to optimize disaggregated and aggregated inference workflows.

coderabbitai · 2026-02-25T08:48:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a DeepEP rebuild script and nine new Qwen3.5 deployment recipes (1P1D variants and aggregated TP4/TEP4/DEP4), specifying parallelism, resource layouts, per-path environment/scheduling knobs, DeepEP/DeepGemm/DEP/TEP options, and benchmark settings.

Changes

Cohort / File(s)	Summary
Build Automation `configs/rebuild-deepep.sh`	New bash script: validates DeepEP/NVSHMEM paths, creates missing `libnvshmem_host.so` symlink, patches `csrc/kernels/internode_ll.cu` to set `kNumMaxTopK=16`, builds with `TORCH_CUDA_ARCH_LIST`/`NVSHMEM_DIR`, runs `pip install -e .` and verifies `import deep_ep`.
1P1D Disaggregated Recipes `recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`, `recipes/qwen3.5/1p1d-dep4.yaml`, `recipes/qwen3.5/1p1d-tep4.yaml`, `recipes/qwen3.5/1p1d-tp4.yaml`, `recipes/qwen3.5/1p1d-tp4-mtp.yaml`	Five new 1P1D configs defining prefill/decode roles, GPU/node/worker counts, extensive per-path environment variables and `sglang_config` entries (served-model, model-path, TP/DP/EP sizes, mamba hybrid scheduling, disaggregation modes, mem/cache/pool sizing, DeepEP/DeepGemm/DEP/TEP/MTP flags), and benchmark sections.
Aggregated Deployments `recipes/qwen3.5/agg-dep4.yaml`, `recipes/qwen3.5/agg-tep4.yaml`, `recipes/qwen3.5/agg-tp4.yaml`	Three aggregated deployment configs for GB200: `aggregated_environment` plus `sglang_config.aggregated` (served-model, model-path, attention backend, quantization/kv-cache dtype, TP/EP/DP sizes, runtime/memory/cuda-graph knobs) and sa-bench benchmark definitions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

ishandhanani
kyleliang-nv
gracehonv

Poem

🐇 I patched a kernel, hopped the symlink tight,
Built DeepEP at dawn, then watched the tests take flight.
Recipes in rows, prefill and decode bloom,
GPUs hum softly in the server-room.
A crunchy carrot clap — deployments leap to light!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Qwen3.5 disagg support' accurately describes the main change: adding disaggregated execution support for Qwen3.5 through new configuration files and build scripts.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)

12-13: Parameterize host mounts for portability.

Line 12 and Line 13 hard-code a user-specific host path (.../yangminl/...). This recipe will be brittle outside that environment.

♻️ Suggested config shape

 extra_mount:
-  - "/lustre/fsw/coreai_comparch_trtllm/yangminl/sglang:/sgl-workspace/sglang"
-  - "/lustre/fsw/coreai_comparch_trtllm/yangminl/DeepEP:/sgl-workspace/DeepEP"
+  - "<SGLANG_HOST_PATH>:/sgl-workspace/sglang"
+  - "<DEEPEP_HOST_PATH>:/sgl-workspace/DeepEP"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 12 - 13, The recipe
hard-codes user-specific host mount paths in the mounts list (the two entries
currently pointing to yangminl directories); replace those literal host paths
with configurable placeholders or environment-variable references (e.g.
${SGLANG_DIR}, ${DEEPEP_DIR}) in recipes/qwen3.5/1p1d-deepep-deepgemm.yaml and
update any default-values or documentation that supplies those env vars so the
container runtime (or orchestration tool) can expand them at runtime; ensure the
keys/entries for the mounts remain the same so the container bind targets are
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@configs/rebuild-deepep.sh`:
- Around line 16-21: NVSHMEM_DIR may be empty causing downstream path builds to
malfunction; add a fail-fast check after NVSHMEM_DIR is set: verify NVSHMEM_DIR
is non-empty and is a directory (use NVSHMEM_DIR and NVSHMEM_LIB identifiers)
and if missing print a clear error to stderr and exit non-zero so the script
stops immediately instead of proceeding to the lib fixup logic.

In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Line 65: The recipe currently enables remote code execution via the
trust-remote-code key; change trust-remote-code to false for shared/prod
deployments to eliminate RCE/supply-chain risk, or if you must keep it true, add
a pinned immutable commit via the revision key (full commit SHA, not a floating
ref) next to trust-remote-code and ensure the model repo is mirrored and
reviewed before production use; update occurrences of trust-remote-code (both
places where it is set) and add the revision entry alongside them.

In `@recipes/qwen3.5/1p1d-tp4-mtp.yaml`:
- Line 83: The served-model-name value is wrong: replace the incorrect
nvidia/DeepSeek-R1-0528-NVFP4-v2 in the served-model-name field with the correct
Qwen3.5 model identifier used elsewhere in this recipe (make served-model-name
match the Qwen3.5 model used by the rest of the config to preserve stage
alignment), e.g., use the same Qwen3.5 model string referenced by other entries
in this YAML so served-model-name is consistent with the recipe.

In `@recipes/qwen3.5/agg-dep4.yaml`:
- Line 68: The config contains an invalid key benchmark.random_range_ratio
(random_range_ratio) which fails schema validation; remove this key or rename it
to the correct schema field (check the current schema for the intended field
name) in the agg-dep4.yaml recipe and re-run config validation to ensure the
file passes; specifically locate the random_range_ratio entry and either delete
that line or replace it with the schema-approved key/value.

In `@recipes/qwen3.5/agg-tep4.yaml`:
- Line 65: The YAML contains a schema-invalid field "random_range_ratio" in
recipes/qwen3.5/agg-tep4.yaml — remove that line (or replace it with the
correct, schema-approved field if you intended a different name) so the recipe
validates; specifically, delete the "random_range_ratio: 1.0" entry in the
agg-tep4 recipe or change it to the sanctioned field name used elsewhere in this
project's recipes.

In `@recipes/qwen3.5/agg-tp4.yaml`:
- Line 61: Remove the unsupported YAML field benchmark.random_range_ratio (the
key shown as random_range_ratio) from the recipe so the file passes schema
validation; locate the entry where random_range_ratio: 1.0 is defined (in
agg-tp4.yaml under the recipe/benchmark section) and delete that line or move
its intended behavior into a supported field or metadata if needed, then re-run
CI to ensure the schema error is resolved.

---

Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 12-13: The recipe hard-codes user-specific host mount paths in the
mounts list (the two entries currently pointing to yangminl directories);
replace those literal host paths with configurable placeholders or
environment-variable references (e.g. ${SGLANG_DIR}, ${DEEPEP_DIR}) in
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml and update any default-values or
documentation that supplies those env vars so the container runtime (or
orchestration tool) can expand them at runtime; ensure the keys/entries for the
mounts remain the same so the container bind targets are unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0658bc8 and a0feada.

📒 Files selected for processing (9)

configs/rebuild-deepep.sh
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml
recipes/qwen3.5/1p1d-dep4.yaml
recipes/qwen3.5/1p1d-tep4.yaml
recipes/qwen3.5/1p1d-tp4-mtp.yaml
recipes/qwen3.5/1p1d-tp4.yaml
recipes/qwen3.5/agg-dep4.yaml
recipes/qwen3.5/agg-tep4.yaml
recipes/qwen3.5/agg-tp4.yaml

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

recipes/qwen3.5/1p1d-tp4.yaml (1)

22-35: Consider YAML anchors to reduce env block drift.

prefill_environment and decode_environment are mostly duplicated. Using an anchor + merge keeps shared knobs centralized and reduces cleanup overhead later.

♻️ Example refactor

+  common_environment: &common_environment
+    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
+    PYTHONUNBUFFERED: "1"
+    NCCL_MNNVL_ENABLE: "1"
+    NCCL_CUMEM_ENABLE: "1"
+    MC_FORCE_MNNVL: "1"
+    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"
+    FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache"
+    SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
+    SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
+    SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
+    SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
+    SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+
   prefill_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
-    PYTHONUNBUFFERED: "1"
-    NCCL_MNNVL_ENABLE: "1"
-    NCCL_CUMEM_ENABLE: "1"
-    MC_FORCE_MNNVL: "1"
-    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"
-    FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache"
-    SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
-    SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
-    SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
-    SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
-    SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+    <<: *common_environment

   decode_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
-    PYTHONUNBUFFERED: "1"
-    NCCL_MNNVL_ENABLE: "1"
-    NCCL_CUMEM_ENABLE: "1"
-    MC_FORCE_MNNVL: "1"
-    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"
-    FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache"
-    SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
-    SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
-    SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
-    SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
-    SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+    <<: *common_environment
     SGLANG_DECODE_BOOTSTRAP_TIMEOUT: "1000"
     SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"

Also applies to: 36-50

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-tp4.yaml` around lines 22 - 35, prefill_environment and
decode_environment duplicate many keys; create a single YAML anchor (e.g., an
env_base anchor) that contains the shared environment entries and then merge it
into both prefill_environment and decode_environment using the YAML merge key
(<<: *env_base), leaving only the unique keys in each block; update the recipe
to remove the duplicated entries and ensure any block-specific overrides are
listed after the merge so they take precedence.

recipes/qwen3.5/1p1d-tep4.yaml (1)

22-53: Reduce duplicated environment blocks with a YAML anchor.

backend.prefill_environment and backend.decode_environment repeat most fields. This is prone to config drift.

Suggested refactor

 backend:
+  common_environment: &common_environment
+    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
+    PYTHONUNBUFFERED: "1"
+    NCCL_MNNVL_ENABLE: "1"
+    NCCL_CUMEM_ENABLE: "1"
+    MC_FORCE_MNNVL: "1"
+    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"
+    FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache"
+    SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
+    SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
+    SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
+    SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
+    SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
 
   prefill_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
-    PYTHONUNBUFFERED: "1"
-    ...
-    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+    <<: *common_environment
 
   decode_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
-    PYTHONUNBUFFERED: "1"
-    ...
-    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+    <<: *common_environment
     SGLANG_DECODE_BOOTSTRAP_TIMEOUT: "1000"
     SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-tep4.yaml` around lines 22 - 53, prefill_environment and
decode_environment duplicate almost all keys; create a shared YAML anchor (e.g.,
&common_env) for the common keys and reference it in both environments with
merge key <<: *common_env, then add or override only the differing keys (like
SGLANG_DECODE_BOOTSTRAP_TIMEOUT and SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) under
decode_environment; update the blocks named prefill_environment and
decode_environment to use this anchor and remove the duplicated entries.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/qwen3.5/1p1d-tep4.yaml`:
- Line 77: The recipe sets context-length: 2020 but the workload declares 1024
input + 1024 output tokens (2048 total), so update the context-length value from
2020 to 2048 wherever it appears (the context-length key in this YAML and the
other occurrence around the 1k+1k workload configuration) so requests aren't
truncated and benchmarks remain comparable.
- Line 35: Replace the incorrect environment variable
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the official name
SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "false" to disable
the TP memory imbalance check; update every occurrence (including the other
instance mentioned) so the config uses SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK:
"false" instead of the ignored DISABLE variant.

In `@recipes/qwen3.5/1p1d-tp4.yaml`:
- Line 75: The recipe sets context-length: 2020 which is smaller than the
benchmark token budget computed as 1024 + 1024 = 2048, causing potential
truncation; update every occurrence of the context-length key (the two entries
currently set to 2020) to be at least 2048 (or increase to a safe margin >2048)
or alternatively reduce the summed token budget (the 1024 + 1024 values) so that
context-length >= budget; ensure you change both context-length occurrences and
verify the computed budget references (the 1024 + 1024 values) remain
consistent.
- Line 60: The decode section uses the CLI shorthand key tp-size which is not
valid in the YAML schema; update the decode configuration to use
tensor-parallel-size instead (matching the prefill section), i.e., locate the
decode block that contains tp-size: 4 and replace that key with
tensor-parallel-size: 4 so the schema and parsing use the correct field.

---

Nitpick comments:
In `@recipes/qwen3.5/1p1d-tep4.yaml`:
- Around line 22-53: prefill_environment and decode_environment duplicate almost
all keys; create a shared YAML anchor (e.g., &common_env) for the common keys
and reference it in both environments with merge key <<: *common_env, then add
or override only the differing keys (like SGLANG_DECODE_BOOTSTRAP_TIMEOUT and
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) under decode_environment; update the blocks
named prefill_environment and decode_environment to use this anchor and remove
the duplicated entries.

In `@recipes/qwen3.5/1p1d-tp4.yaml`:
- Around line 22-35: prefill_environment and decode_environment duplicate many
keys; create a single YAML anchor (e.g., an env_base anchor) that contains the
shared environment entries and then merge it into both prefill_environment and
decode_environment using the YAML merge key (<<: *env_base), leaving only the
unique keys in each block; update the recipe to remove the duplicated entries
and ensure any block-specific overrides are listed after the merge so they take
precedence.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0feada and 10b7ed5.

📒 Files selected for processing (9)

configs/rebuild-deepep.sh
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml
recipes/qwen3.5/1p1d-dep4.yaml
recipes/qwen3.5/1p1d-tep4.yaml
recipes/qwen3.5/1p1d-tp4-mtp.yaml
recipes/qwen3.5/1p1d-tp4.yaml
recipes/qwen3.5/agg-dep4.yaml
recipes/qwen3.5/agg-tep4.yaml
recipes/qwen3.5/agg-tp4.yaml

✅ Files skipped from review due to trivial changes (1)

recipes/qwen3.5/1p1d-deepep-deepgemm.yaml

🚧 Files skipped from review as they are similar to previous changes (6)

recipes/qwen3.5/1p1d-dep4.yaml
configs/rebuild-deepep.sh
recipes/qwen3.5/1p1d-tp4-mtp.yaml
recipes/qwen3.5/agg-dep4.yaml
recipes/qwen3.5/agg-tp4.yaml
recipes/qwen3.5/agg-tep4.yaml

coderabbitai

🧹 Nitpick comments (1)

recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)

24-55: Reduce duplicated env blocks to avoid config drift.

prefill_environment and decode_environment duplicate many keys. Consider YAML anchors/merge keys to keep shared settings in one place.

♻️ Example refactor using YAML anchors

 backend:
+  common_environment: &common_environment
+    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
+    PYTHONUNBUFFERED: "1"
+    NCCL_MNNVL_ENABLE: "1"
+    NCCL_CUMEM_ENABLE: "1"
+    MC_FORCE_MNNVL: "1"
+    SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache"
+    FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache"
+    SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
+    SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
+    SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
+    SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
+    SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
 
   prefill_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
-    PYTHONUNBUFFERED: "1"
-    ...
+    <<: *common_environment
 
   decode_environment:
-    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
-    PYTHONUNBUFFERED: "1"
-    ...
+    <<: *common_environment
     MC_TE_METRIC: "true"
     SGLANG_DECODE_BOOTSTRAP_TIMEOUT: "1000"
     SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 24 - 55,
prefill_environment and decode_environment repeat many identical keys; refactor
by extracting the shared key/value pairs into a single anchor (e.g.,
&shared_env) and then merge it into both blocks using YAML merge keys (<<:
*shared_env), leaving only the differing keys (like MC_TE_METRIC or
SGLANG_DECODE_BOOTSTRAP_TIMEOUT) in decode_environment; update the recipe so
prefill_environment and decode_environment reference the shared anchor to remove
duplication and prevent config drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 24-55: prefill_environment and decode_environment repeat many
identical keys; refactor by extracting the shared key/value pairs into a single
anchor (e.g., &shared_env) and then merge it into both blocks using YAML merge
keys (<<: *shared_env), leaving only the differing keys (like MC_TE_METRIC or
SGLANG_DECODE_BOOTSTRAP_TIMEOUT) in decode_environment; update the recipe so
prefill_environment and decode_environment reference the shared anchor to remove
duplication and prevent config drift.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10b7ed5 and f9d6028.

📒 Files selected for processing (1)

recipes/qwen3.5/1p1d-deepep-deepgemm.yaml

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)
37-39: Large timeout values are acceptable for accuracy testing but may mask issues.

The disaggregation timeouts (100000ms) and watchdog timeout (1000000ms) are generous. While appropriate for accuracy verification where you want to avoid spurious failures, consider reducing these for performance benchmarking recipes to surface actual timeout issues.

Also applies to: 53-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 37 - 39, Timeouts set
for disaggregation (SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE,
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT
and the watchdog timeout referenced later) are excessively large for performance
benchmarking; adjust these YAML values downward to reasonable benchmarking
defaults (for example, heartbeat failures ~5–50, bootstrap/waiting timeouts in
the 5000–30000 ms range and watchdog timeout proportionally lower than 1000000)
so real timeout/pathology surfaces during tests, and apply the same reductions
to the corresponding entries later in the file where the other timeout keys are
defined.
recipes/qwen3.5/1p1d-dep4.yaml (1)
30-32: Failure detection timeouts are effectively disabled.

Line 30–32, 45–47, 79, and 107 set very large timeout/failure thresholds. This can keep dead jobs alive for hours/days and delay recovery during cluster incidents.

Also applies to: 45-47, 79-79, 107-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-dep4.yaml` around lines 30 - 32, These env vars
(SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE,
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT)
use an excessive "100000" value that effectively disables failure detection;
change them to sensible defaults (e.g., seconds or ms consistent with the system
— such as 30–300s or a small failure count) or make them configurable via a
central timeout config, and replace all occurrences (the same three keys at the
other locations noted) so dead jobs are detected and recovered promptly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/qwen3.5/1p1d-dep4.yaml`:
- Around line 13-18: The decode world is undersized: with
tensor-parallel-size=4, data-parallel-size=4 and expert-parallel-size=4 SGLang
requires TP×DP=16 GPUs for the disaggregated decode world but decode_nodes=1
with gpus_per_node=4 yields only 4 GPUs; change decode_nodes from 1 to 4 so
decode_nodes: 4 (keeping gpus_per_node: 4) to provide 16 decode GPUs, and leave
tensor-parallel-size, data-parallel-size, expert-parallel-size at 4 and
enable-dp-attention: true to satisfy DP-Attention constraints.

In `@recipes/qwen3.5/1p1d-tp4-mtp.yaml`:
- Around line 89-91: Remove the Mamba-specific keys from the decode section:
delete mamba-scheduler-strategy, mamba-track-interval, and mamba-ssm-dtype
entries (the same three keys present in the shown diff) since they are not
applicable to Qwen3.5; ensure the decode section no longer contains any mamba-*
settings and mirror the prefill-section fix so only Qwen-relevant settings
remain.
- Around line 64-66: Remove the three Mamba-specific configuration keys that
don't apply to Qwen3.5: mamba-scheduler-strategy, mamba-track-interval, and
mamba-ssm-dtype; locate them in the recipe file (they appear together in the
block with other scheduler/precision settings) and delete those entries so the
config only contains Qwen3.5-relevant options.

In `@recipes/qwen3.5/agg-dep4.yaml`:
- Line 26: Replace the incorrect environment variable
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the documented SGLang variable
SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "0" to disable the
memory imbalance check (i.e., change SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK:
"1" to SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK: "0").

---

Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 37-39: Timeouts set for disaggregation
(SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE,
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT
and the watchdog timeout referenced later) are excessively large for performance
benchmarking; adjust these YAML values downward to reasonable benchmarking
defaults (for example, heartbeat failures ~5–50, bootstrap/waiting timeouts in
the 5000–30000 ms range and watchdog timeout proportionally lower than 1000000)
so real timeout/pathology surfaces during tests, and apply the same reductions
to the corresponding entries later in the file where the other timeout keys are
defined.

In `@recipes/qwen3.5/1p1d-dep4.yaml`:
- Around line 30-32: These env vars
(SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE,
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT)
use an excessive "100000" value that effectively disables failure detection;
change them to sensible defaults (e.g., seconds or ms consistent with the system
— such as 30–300s or a small failure count) or make them configurable via a
central timeout config, and replace all occurrences (the same three keys at the
other locations noted) so dead jobs are detected and recovered promptly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9d6028 and 28d6bb0.

📒 Files selected for processing (9)

configs/rebuild-deepep.sh
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml
recipes/qwen3.5/1p1d-dep4.yaml
recipes/qwen3.5/1p1d-tep4.yaml
recipes/qwen3.5/1p1d-tp4-mtp.yaml
recipes/qwen3.5/1p1d-tp4.yaml
recipes/qwen3.5/agg-dep4.yaml
recipes/qwen3.5/agg-tep4.yaml
recipes/qwen3.5/agg-tp4.yaml

🚧 Files skipped from review as they are similar to previous changes (5)

recipes/qwen3.5/1p1d-tp4.yaml
configs/rebuild-deepep.sh
recipes/qwen3.5/agg-tp4.yaml
recipes/qwen3.5/agg-tep4.yaml
recipes/qwen3.5/1p1d-tep4.yaml

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

recipes/qwen3.5/1p1d-tep4.yaml (1)
35-35: ⚠️ Potential issue | 🟠 Major

Use the documented TP-memory-check env var key (current key may be ignored).

Line [35] and Line [52] use SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK. Current SGLang env var docs list SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK (default true), so the current key can fail to disable the check as intended: https://docs.sglang.io/references/environment_variables.html
Proposed fix
-    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+    SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK: "false"
...
-    SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+    SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK: "false"
#!/bin/bash
# Verify TP memory imbalance env var usage in this recipe
rg -n 'SGLANG_(DISABLE|ENABLE)_TP_MEMORY_INBALANCE_CHECK' recipes/qwen3.5/1p1d-tep4.yaml
In the latest SGLang environment variables documentation, what is the exact variable name for TP memory imbalance checks, and what value disables that check?
Also applies to: 52-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-tep4.yaml` at line 35, The recipe uses the wrong env var
key SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK which the docs don't recognize;
replace both occurrences of SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the
documented key SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "0"
to disable the TP memory imbalance check (ensure you update both places where
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK appears in the file).

🧹 Nitpick comments (2)

recipes/qwen3.5/1p1d-tp4.yaml (1)

22-50: Consider de-duplicating shared environment keys.

prefill_environment and decode_environment repeat most fields; centralizing common entries would reduce config drift across recipe variants.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-tp4.yaml` around lines 22 - 50, The prefill_environment
and decode_environment blocks repeat most keys; centralize the shared settings
by extracting them into a single common map (e.g., common_environment) or use
YAML anchors/aliases, then reference that from both prefill_environment and
decode_environment and only override the differing keys (like
SGLANG_DECODE_BOOTSTRAP_TIMEOUT and SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in
decode_environment; update any consumers to read merged env if needed and ensure
unique symbols referenced are prefill_environment and decode_environment so you
only move duplicated keys once.

recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)

26-58: Consider YAML anchors to reduce env drift between prefill/decode blocks.

prefill_environment and decode_environment duplicate many identical keys. Anchors/merge keys would make future updates safer and easier.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 26 - 58, The
prefill_environment and decode_environment blocks duplicate many identical keys;
define a shared YAML anchor (e.g., &default_env) containing the common key/value
pairs, then use YAML merge (<<: *default_env) inside both prefill_environment
and decode_environment and only list the differing keys (like MC_TE_METRIC,
SGLANG_DECODE_BOOTSTRAP_TIMEOUT, SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in
decode_environment to override or add them; update the recipe to reference the
anchor so future changes apply in one place and reduce drift between
prefill_environment and decode_environment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Line 145: The max_tokens setting currently violates the documented requirement
because max_tokens: 65536 exceeds context-length (2020); update the recipe so
that the numeric value of max_tokens is strictly less than the context-length
(or increase context-length accordingly) by modifying the max_tokens key (or the
context-length key) in the same YAML (referencing the max_tokens and
context-length fields) to ensure max_tokens < context-length before running
inference.

In `@recipes/qwen3.5/agg-tep4.yaml`:
- Line 51: Add a brief explanatory comment next to the context-length setting to
clarify why the non-power-of-two value “context-length: 2020” is used in the
Qwen3.5 recipes (e.g., that it’s a small/test-only context size chosen for
benchmark/test efficiency); update the YAML entry in agg-tep4.yaml by appending
a one-line comment after the context-length key to state the rationale so future
readers know it’s intentional.

---

Duplicate comments:
In `@recipes/qwen3.5/1p1d-tep4.yaml`:
- Line 35: The recipe uses the wrong env var key
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK which the docs don't recognize; replace
both occurrences of SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the documented
key SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "0" to disable
the TP memory imbalance check (ensure you update both places where
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK appears in the file).

---

Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 26-58: The prefill_environment and decode_environment blocks
duplicate many identical keys; define a shared YAML anchor (e.g., &default_env)
containing the common key/value pairs, then use YAML merge (<<: *default_env)
inside both prefill_environment and decode_environment and only list the
differing keys (like MC_TE_METRIC, SGLANG_DECODE_BOOTSTRAP_TIMEOUT,
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in decode_environment to override or add them;
update the recipe to reference the anchor so future changes apply in one place
and reduce drift between prefill_environment and decode_environment.

In `@recipes/qwen3.5/1p1d-tp4.yaml`:
- Around line 22-50: The prefill_environment and decode_environment blocks
repeat most keys; centralize the shared settings by extracting them into a
single common map (e.g., common_environment) or use YAML anchors/aliases, then
reference that from both prefill_environment and decode_environment and only
override the differing keys (like SGLANG_DECODE_BOOTSTRAP_TIMEOUT and
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in decode_environment; update any consumers to
read merged env if needed and ensure unique symbols referenced are
prefill_environment and decode_environment so you only move duplicated keys
once.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28d6bb0 and 84abf86.

📒 Files selected for processing (9)

configs/rebuild-deepep.sh
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml
recipes/qwen3.5/1p1d-dep4.yaml
recipes/qwen3.5/1p1d-tep4.yaml
recipes/qwen3.5/1p1d-tp4-mtp.yaml
recipes/qwen3.5/1p1d-tp4.yaml
recipes/qwen3.5/agg-dep4.yaml
recipes/qwen3.5/agg-tep4.yaml
recipes/qwen3.5/agg-tp4.yaml

🚧 Files skipped from review as they are similar to previous changes (5)

recipes/qwen3.5/agg-dep4.yaml
recipes/qwen3.5/1p1d-tp4-mtp.yaml
recipes/qwen3.5/1p1d-dep4.yaml
recipes/qwen3.5/agg-tp4.yaml
configs/rebuild-deepep.sh

hlu1 · 2026-03-04T03:46:27Z

+
+      tensor-parallel-size: 4
+      data-parallel-size: 1
+      expert-parallel-size: 1


Please address

hlu1

Thanks! Good job!

[WIP] qwen3.5 disagg support

2cbbcc0

YAMY1234 marked this pull request as draft February 23, 2026 16:52

temp deep ep config

e98bf91

update agg configs

a0feada

YAMY1234 marked this pull request as ready for review March 3, 2026 18:53

YAMY1234 changed the title ~~[WIP] qwen3.5 disagg support~~ Qwen3.5 disagg support Mar 3, 2026

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

hlu1 reviewed Mar 3, 2026

View reviewed changes

YAMY1234 added 2 commits March 3, 2026 13:19

clean up and fix

10b7ed5

add some comment for deepep

f9d6028

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Comment thread recipes/qwen3.5/1p1d-tep4-tep4.yaml

Comment thread recipes/qwen3.5/1p1d-tep4-tep4.yaml

Comment thread recipes/qwen3.5/1p1d-tp4-tp4.yaml

Comment thread recipes/qwen3.5/1p1d-tp4-tp4.yaml

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

YAMY1234 force-pushed the qwen3.5 branch from 28d6bb0 to e9cd6f7 Compare March 3, 2026 22:32

resolve code-rabbit comments

84abf86

YAMY1234 force-pushed the qwen3.5 branch from e9cd6f7 to 84abf86 Compare March 3, 2026 22:32

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Comment thread recipes/qwen3.5/experimental/1p1d-tp4-dep4.yaml

Comment thread recipes/qwen3.5/1p1d-tp4-tp4-mtp.yaml

Comment thread recipes/qwen3.5/1p1d-tp4-tp4-mtp.yaml

Comment thread recipes/qwen3.5/agg-dep4.yaml

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Comment thread recipes/qwen3.5/experimental/1p1d-tp4-deepep-deepgemm.yaml

Comment thread recipes/qwen3.5/agg-tep4.yaml

remove content len for deepep config

e6e5efc

YAMY1234 force-pushed the qwen3.5 branch from 595056b to e6e5efc Compare March 4, 2026 00:13

hlu1 reviewed Mar 4, 2026

View reviewed changes

YAMY1234 added 3 commits March 3, 2026 21:00

naming and fixing

96be4cc

add experimental folder

432220d

combine kNumMaxTopK

80eb08d

hlu1 approved these changes Mar 4, 2026

View reviewed changes

fix rebuild_deepep

1567392

YAMY1234 merged commit 507b553 into ishandhanani:main Mar 5, 2026
5 checks passed

coderabbitai bot mentioned this pull request Apr 3, 2026

Update Qwen3.5 recipes #240

Merged

Conversation

YAMY1234 commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hlu1 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hlu1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YAMY1234 commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading