Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a DeepEP rebuild script and nine new Qwen3.5 deployment recipes (1P1D variants and aggregated TP4/TEP4/DEP4), specifying parallelism, resource layouts, per-path environment/scheduling knobs, DeepEP/DeepGemm/DEP/TEP options, and benchmark settings. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)
12-13: Parameterize host mounts for portability.Line 12 and Line 13 hard-code a user-specific host path (
.../yangminl/...). This recipe will be brittle outside that environment.♻️ Suggested config shape
extra_mount: - - "/lustre/fsw/coreai_comparch_trtllm/yangminl/sglang:/sgl-workspace/sglang" - - "/lustre/fsw/coreai_comparch_trtllm/yangminl/DeepEP:/sgl-workspace/DeepEP" + - "<SGLANG_HOST_PATH>:/sgl-workspace/sglang" + - "<DEEPEP_HOST_PATH>:/sgl-workspace/DeepEP"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 12 - 13, The recipe hard-codes user-specific host mount paths in the mounts list (the two entries currently pointing to yangminl directories); replace those literal host paths with configurable placeholders or environment-variable references (e.g. ${SGLANG_DIR}, ${DEEPEP_DIR}) in recipes/qwen3.5/1p1d-deepep-deepgemm.yaml and update any default-values or documentation that supplies those env vars so the container runtime (or orchestration tool) can expand them at runtime; ensure the keys/entries for the mounts remain the same so the container bind targets are unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@configs/rebuild-deepep.sh`:
- Around line 16-21: NVSHMEM_DIR may be empty causing downstream path builds to
malfunction; add a fail-fast check after NVSHMEM_DIR is set: verify NVSHMEM_DIR
is non-empty and is a directory (use NVSHMEM_DIR and NVSHMEM_LIB identifiers)
and if missing print a clear error to stderr and exit non-zero so the script
stops immediately instead of proceeding to the lib fixup logic.
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Line 65: The recipe currently enables remote code execution via the
trust-remote-code key; change trust-remote-code to false for shared/prod
deployments to eliminate RCE/supply-chain risk, or if you must keep it true, add
a pinned immutable commit via the revision key (full commit SHA, not a floating
ref) next to trust-remote-code and ensure the model repo is mirrored and
reviewed before production use; update occurrences of trust-remote-code (both
places where it is set) and add the revision entry alongside them.
In `@recipes/qwen3.5/1p1d-tp4-mtp.yaml`:
- Line 83: The served-model-name value is wrong: replace the incorrect
nvidia/DeepSeek-R1-0528-NVFP4-v2 in the served-model-name field with the correct
Qwen3.5 model identifier used elsewhere in this recipe (make served-model-name
match the Qwen3.5 model used by the rest of the config to preserve stage
alignment), e.g., use the same Qwen3.5 model string referenced by other entries
in this YAML so served-model-name is consistent with the recipe.
In `@recipes/qwen3.5/agg-dep4.yaml`:
- Line 68: The config contains an invalid key benchmark.random_range_ratio
(random_range_ratio) which fails schema validation; remove this key or rename it
to the correct schema field (check the current schema for the intended field
name) in the agg-dep4.yaml recipe and re-run config validation to ensure the
file passes; specifically locate the random_range_ratio entry and either delete
that line or replace it with the schema-approved key/value.
In `@recipes/qwen3.5/agg-tep4.yaml`:
- Line 65: The YAML contains a schema-invalid field "random_range_ratio" in
recipes/qwen3.5/agg-tep4.yaml — remove that line (or replace it with the
correct, schema-approved field if you intended a different name) so the recipe
validates; specifically, delete the "random_range_ratio: 1.0" entry in the
agg-tep4 recipe or change it to the sanctioned field name used elsewhere in this
project's recipes.
In `@recipes/qwen3.5/agg-tp4.yaml`:
- Line 61: Remove the unsupported YAML field benchmark.random_range_ratio (the
key shown as random_range_ratio) from the recipe so the file passes schema
validation; locate the entry where random_range_ratio: 1.0 is defined (in
agg-tp4.yaml under the recipe/benchmark section) and delete that line or move
its intended behavior into a supported field or metadata if needed, then re-run
CI to ensure the schema error is resolved.
---
Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 12-13: The recipe hard-codes user-specific host mount paths in the
mounts list (the two entries currently pointing to yangminl directories);
replace those literal host paths with configurable placeholders or
environment-variable references (e.g. ${SGLANG_DIR}, ${DEEPEP_DIR}) in
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml and update any default-values or
documentation that supplies those env vars so the container runtime (or
orchestration tool) can expand them at runtime; ensure the keys/entries for the
mounts remain the same so the container bind targets are unchanged.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
configs/rebuild-deepep.shrecipes/qwen3.5/1p1d-deepep-deepgemm.yamlrecipes/qwen3.5/1p1d-dep4.yamlrecipes/qwen3.5/1p1d-tep4.yamlrecipes/qwen3.5/1p1d-tp4-mtp.yamlrecipes/qwen3.5/1p1d-tp4.yamlrecipes/qwen3.5/agg-dep4.yamlrecipes/qwen3.5/agg-tep4.yamlrecipes/qwen3.5/agg-tp4.yaml
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
recipes/qwen3.5/1p1d-tp4.yaml (1)
22-35: Consider YAML anchors to reduce env block drift.
prefill_environmentanddecode_environmentare mostly duplicated. Using an anchor + merge keeps shared knobs centralized and reduces cleanup overhead later.♻️ Example refactor
+ common_environment: &common_environment + TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" + PYTHONUNBUFFERED: "1" + NCCL_MNNVL_ENABLE: "1" + NCCL_CUMEM_ENABLE: "1" + MC_FORCE_MNNVL: "1" + SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache" + FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache" + SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000" + SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000" + SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000" + SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True" + SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0" + prefill_environment: - TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" - PYTHONUNBUFFERED: "1" - NCCL_MNNVL_ENABLE: "1" - NCCL_CUMEM_ENABLE: "1" - MC_FORCE_MNNVL: "1" - SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache" - FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache" - SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000" - SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000" - SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000" - SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True" - SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0" + <<: *common_environment decode_environment: - TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" - PYTHONUNBUFFERED: "1" - NCCL_MNNVL_ENABLE: "1" - NCCL_CUMEM_ENABLE: "1" - MC_FORCE_MNNVL: "1" - SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache" - FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache" - SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000" - SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000" - SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000" - SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True" - SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0" + <<: *common_environment SGLANG_DECODE_BOOTSTRAP_TIMEOUT: "1000" SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"Also applies to: 36-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-tp4.yaml` around lines 22 - 35, prefill_environment and decode_environment duplicate many keys; create a single YAML anchor (e.g., an env_base anchor) that contains the shared environment entries and then merge it into both prefill_environment and decode_environment using the YAML merge key (<<: *env_base), leaving only the unique keys in each block; update the recipe to remove the duplicated entries and ensure any block-specific overrides are listed after the merge so they take precedence.recipes/qwen3.5/1p1d-tep4.yaml (1)
22-53: Reduce duplicated environment blocks with a YAML anchor.
backend.prefill_environmentandbackend.decode_environmentrepeat most fields. This is prone to config drift.Suggested refactor
backend: + common_environment: &common_environment + TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" + PYTHONUNBUFFERED: "1" + NCCL_MNNVL_ENABLE: "1" + NCCL_CUMEM_ENABLE: "1" + MC_FORCE_MNNVL: "1" + SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache" + FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache" + SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000" + SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000" + SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000" + SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True" + SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0" + SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1" prefill_environment: - TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" - PYTHONUNBUFFERED: "1" - ... - SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1" + <<: *common_environment decode_environment: - TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" - PYTHONUNBUFFERED: "1" - ... - SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1" + <<: *common_environment SGLANG_DECODE_BOOTSTRAP_TIMEOUT: "1000" SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-tep4.yaml` around lines 22 - 53, prefill_environment and decode_environment duplicate almost all keys; create a shared YAML anchor (e.g., &common_env) for the common keys and reference it in both environments with merge key <<: *common_env, then add or override only the differing keys (like SGLANG_DECODE_BOOTSTRAP_TIMEOUT and SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) under decode_environment; update the blocks named prefill_environment and decode_environment to use this anchor and remove the duplicated entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/qwen3.5/1p1d-tep4.yaml`:
- Line 77: The recipe sets context-length: 2020 but the workload declares 1024
input + 1024 output tokens (2048 total), so update the context-length value from
2020 to 2048 wherever it appears (the context-length key in this YAML and the
other occurrence around the 1k+1k workload configuration) so requests aren't
truncated and benchmarks remain comparable.
- Line 35: Replace the incorrect environment variable
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the official name
SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "false" to disable
the TP memory imbalance check; update every occurrence (including the other
instance mentioned) so the config uses SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK:
"false" instead of the ignored DISABLE variant.
In `@recipes/qwen3.5/1p1d-tp4.yaml`:
- Line 75: The recipe sets context-length: 2020 which is smaller than the
benchmark token budget computed as 1024 + 1024 = 2048, causing potential
truncation; update every occurrence of the context-length key (the two entries
currently set to 2020) to be at least 2048 (or increase to a safe margin >2048)
or alternatively reduce the summed token budget (the 1024 + 1024 values) so that
context-length >= budget; ensure you change both context-length occurrences and
verify the computed budget references (the 1024 + 1024 values) remain
consistent.
- Line 60: The decode section uses the CLI shorthand key tp-size which is not
valid in the YAML schema; update the decode configuration to use
tensor-parallel-size instead (matching the prefill section), i.e., locate the
decode block that contains tp-size: 4 and replace that key with
tensor-parallel-size: 4 so the schema and parsing use the correct field.
---
Nitpick comments:
In `@recipes/qwen3.5/1p1d-tep4.yaml`:
- Around line 22-53: prefill_environment and decode_environment duplicate almost
all keys; create a shared YAML anchor (e.g., &common_env) for the common keys
and reference it in both environments with merge key <<: *common_env, then add
or override only the differing keys (like SGLANG_DECODE_BOOTSTRAP_TIMEOUT and
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) under decode_environment; update the blocks
named prefill_environment and decode_environment to use this anchor and remove
the duplicated entries.
In `@recipes/qwen3.5/1p1d-tp4.yaml`:
- Around line 22-35: prefill_environment and decode_environment duplicate many
keys; create a single YAML anchor (e.g., an env_base anchor) that contains the
shared environment entries and then merge it into both prefill_environment and
decode_environment using the YAML merge key (<<: *env_base), leaving only the
unique keys in each block; update the recipe to remove the duplicated entries
and ensure any block-specific overrides are listed after the merge so they take
precedence.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
configs/rebuild-deepep.shrecipes/qwen3.5/1p1d-deepep-deepgemm.yamlrecipes/qwen3.5/1p1d-dep4.yamlrecipes/qwen3.5/1p1d-tep4.yamlrecipes/qwen3.5/1p1d-tp4-mtp.yamlrecipes/qwen3.5/1p1d-tp4.yamlrecipes/qwen3.5/agg-dep4.yamlrecipes/qwen3.5/agg-tep4.yamlrecipes/qwen3.5/agg-tp4.yaml
✅ Files skipped from review due to trivial changes (1)
- recipes/qwen3.5/1p1d-deepep-deepgemm.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- recipes/qwen3.5/1p1d-dep4.yaml
- configs/rebuild-deepep.sh
- recipes/qwen3.5/1p1d-tp4-mtp.yaml
- recipes/qwen3.5/agg-dep4.yaml
- recipes/qwen3.5/agg-tp4.yaml
- recipes/qwen3.5/agg-tep4.yaml
There was a problem hiding this comment.
🧹 Nitpick comments (1)
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)
24-55: Reduce duplicated env blocks to avoid config drift.
prefill_environmentanddecode_environmentduplicate many keys. Consider YAML anchors/merge keys to keep shared settings in one place.♻️ Example refactor using YAML anchors
backend: + common_environment: &common_environment + TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" + PYTHONUNBUFFERED: "1" + NCCL_MNNVL_ENABLE: "1" + NCCL_CUMEM_ENABLE: "1" + MC_FORCE_MNNVL: "1" + SGLANG_DG_CACHE_DIR: "/configs/deepgemm-cache" + FLASHINFER_WORKSPACE_BASE: "/configs/flashinfer-cache" + SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000" + SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000" + SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000" + SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True" + SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0" + SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1" prefill_environment: - TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" - PYTHONUNBUFFERED: "1" - ... + <<: *common_environment decode_environment: - TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800" - PYTHONUNBUFFERED: "1" - ... + <<: *common_environment MC_TE_METRIC: "true" SGLANG_DECODE_BOOTSTRAP_TIMEOUT: "1000" SGLANG_HACK_SEQ_BOOTSTRAP_ROOM: "1"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 24 - 55, prefill_environment and decode_environment repeat many identical keys; refactor by extracting the shared key/value pairs into a single anchor (e.g., &shared_env) and then merge it into both blocks using YAML merge keys (<<: *shared_env), leaving only the differing keys (like MC_TE_METRIC or SGLANG_DECODE_BOOTSTRAP_TIMEOUT) in decode_environment; update the recipe so prefill_environment and decode_environment reference the shared anchor to remove duplication and prevent config drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 24-55: prefill_environment and decode_environment repeat many
identical keys; refactor by extracting the shared key/value pairs into a single
anchor (e.g., &shared_env) and then merge it into both blocks using YAML merge
keys (<<: *shared_env), leaving only the differing keys (like MC_TE_METRIC or
SGLANG_DECODE_BOOTSTRAP_TIMEOUT) in decode_environment; update the recipe so
prefill_environment and decode_environment reference the shared anchor to remove
duplication and prevent config drift.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)
37-39: Large timeout values are acceptable for accuracy testing but may mask issues.The disaggregation timeouts (100000ms) and watchdog timeout (1000000ms) are generous. While appropriate for accuracy verification where you want to avoid spurious failures, consider reducing these for performance benchmarking recipes to surface actual timeout issues.
Also applies to: 53-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 37 - 39, Timeouts set for disaggregation (SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE, SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT and the watchdog timeout referenced later) are excessively large for performance benchmarking; adjust these YAML values downward to reasonable benchmarking defaults (for example, heartbeat failures ~5–50, bootstrap/waiting timeouts in the 5000–30000 ms range and watchdog timeout proportionally lower than 1000000) so real timeout/pathology surfaces during tests, and apply the same reductions to the corresponding entries later in the file where the other timeout keys are defined.recipes/qwen3.5/1p1d-dep4.yaml (1)
30-32: Failure detection timeouts are effectively disabled.Line 30–32, 45–47, 79, and 107 set very large timeout/failure thresholds. This can keep dead jobs alive for hours/days and delay recovery during cluster incidents.
Also applies to: 45-47, 79-79, 107-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-dep4.yaml` around lines 30 - 32, These env vars (SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE, SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT) use an excessive "100000" value that effectively disables failure detection; change them to sensible defaults (e.g., seconds or ms consistent with the system — such as 30–300s or a small failure count) or make them configurable via a central timeout config, and replace all occurrences (the same three keys at the other locations noted) so dead jobs are detected and recovered promptly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/qwen3.5/1p1d-dep4.yaml`:
- Around line 13-18: The decode world is undersized: with
tensor-parallel-size=4, data-parallel-size=4 and expert-parallel-size=4 SGLang
requires TP×DP=16 GPUs for the disaggregated decode world but decode_nodes=1
with gpus_per_node=4 yields only 4 GPUs; change decode_nodes from 1 to 4 so
decode_nodes: 4 (keeping gpus_per_node: 4) to provide 16 decode GPUs, and leave
tensor-parallel-size, data-parallel-size, expert-parallel-size at 4 and
enable-dp-attention: true to satisfy DP-Attention constraints.
In `@recipes/qwen3.5/1p1d-tp4-mtp.yaml`:
- Around line 89-91: Remove the Mamba-specific keys from the decode section:
delete mamba-scheduler-strategy, mamba-track-interval, and mamba-ssm-dtype
entries (the same three keys present in the shown diff) since they are not
applicable to Qwen3.5; ensure the decode section no longer contains any mamba-*
settings and mirror the prefill-section fix so only Qwen-relevant settings
remain.
- Around line 64-66: Remove the three Mamba-specific configuration keys that
don't apply to Qwen3.5: mamba-scheduler-strategy, mamba-track-interval, and
mamba-ssm-dtype; locate them in the recipe file (they appear together in the
block with other scheduler/precision settings) and delete those entries so the
config only contains Qwen3.5-relevant options.
In `@recipes/qwen3.5/agg-dep4.yaml`:
- Line 26: Replace the incorrect environment variable
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the documented SGLang variable
SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "0" to disable the
memory imbalance check (i.e., change SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK:
"1" to SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK: "0").
---
Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 37-39: Timeouts set for disaggregation
(SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE,
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT
and the watchdog timeout referenced later) are excessively large for performance
benchmarking; adjust these YAML values downward to reasonable benchmarking
defaults (for example, heartbeat failures ~5–50, bootstrap/waiting timeouts in
the 5000–30000 ms range and watchdog timeout proportionally lower than 1000000)
so real timeout/pathology surfaces during tests, and apply the same reductions
to the corresponding entries later in the file where the other timeout keys are
defined.
In `@recipes/qwen3.5/1p1d-dep4.yaml`:
- Around line 30-32: These env vars
(SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE,
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT, SGLANG_DISAGGREGATION_WAITING_TIMEOUT)
use an excessive "100000" value that effectively disables failure detection;
change them to sensible defaults (e.g., seconds or ms consistent with the system
— such as 30–300s or a small failure count) or make them configurable via a
central timeout config, and replace all occurrences (the same three keys at the
other locations noted) so dead jobs are detected and recovered promptly.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
configs/rebuild-deepep.shrecipes/qwen3.5/1p1d-deepep-deepgemm.yamlrecipes/qwen3.5/1p1d-dep4.yamlrecipes/qwen3.5/1p1d-tep4.yamlrecipes/qwen3.5/1p1d-tp4-mtp.yamlrecipes/qwen3.5/1p1d-tp4.yamlrecipes/qwen3.5/agg-dep4.yamlrecipes/qwen3.5/agg-tep4.yamlrecipes/qwen3.5/agg-tp4.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
- recipes/qwen3.5/1p1d-tp4.yaml
- configs/rebuild-deepep.sh
- recipes/qwen3.5/agg-tp4.yaml
- recipes/qwen3.5/agg-tep4.yaml
- recipes/qwen3.5/1p1d-tep4.yaml
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
recipes/qwen3.5/1p1d-tep4.yaml (1)
35-35:⚠️ Potential issue | 🟠 MajorUse the documented TP-memory-check env var key (current key may be ignored).
Line [35] and Line [52] use
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK. Current SGLang env var docs listSGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK(defaulttrue), so the current key can fail to disable the check as intended: https://docs.sglang.io/references/environment_variables.htmlProposed fix
- SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1" + SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK: "false" ... - SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1" + SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK: "false"#!/bin/bash # Verify TP memory imbalance env var usage in this recipe rg -n 'SGLANG_(DISABLE|ENABLE)_TP_MEMORY_INBALANCE_CHECK' recipes/qwen3.5/1p1d-tep4.yamlIn the latest SGLang environment variables documentation, what is the exact variable name for TP memory imbalance checks, and what value disables that check?Also applies to: 52-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-tep4.yaml` at line 35, The recipe uses the wrong env var key SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK which the docs don't recognize; replace both occurrences of SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the documented key SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "0" to disable the TP memory imbalance check (ensure you update both places where SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK appears in the file).
🧹 Nitpick comments (2)
recipes/qwen3.5/1p1d-tp4.yaml (1)
22-50: Consider de-duplicating shared environment keys.
prefill_environmentanddecode_environmentrepeat most fields; centralizing common entries would reduce config drift across recipe variants.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-tp4.yaml` around lines 22 - 50, The prefill_environment and decode_environment blocks repeat most keys; centralize the shared settings by extracting them into a single common map (e.g., common_environment) or use YAML anchors/aliases, then reference that from both prefill_environment and decode_environment and only override the differing keys (like SGLANG_DECODE_BOOTSTRAP_TIMEOUT and SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in decode_environment; update any consumers to read merged env if needed and ensure unique symbols referenced are prefill_environment and decode_environment so you only move duplicated keys once.recipes/qwen3.5/1p1d-deepep-deepgemm.yaml (1)
26-58: Consider YAML anchors to reduce env drift between prefill/decode blocks.
prefill_environmentanddecode_environmentduplicate many identical keys. Anchors/merge keys would make future updates safer and easier.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml` around lines 26 - 58, The prefill_environment and decode_environment blocks duplicate many identical keys; define a shared YAML anchor (e.g., &default_env) containing the common key/value pairs, then use YAML merge (<<: *default_env) inside both prefill_environment and decode_environment and only list the differing keys (like MC_TE_METRIC, SGLANG_DECODE_BOOTSTRAP_TIMEOUT, SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in decode_environment to override or add them; update the recipe to reference the anchor so future changes apply in one place and reduce drift between prefill_environment and decode_environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Line 145: The max_tokens setting currently violates the documented requirement
because max_tokens: 65536 exceeds context-length (2020); update the recipe so
that the numeric value of max_tokens is strictly less than the context-length
(or increase context-length accordingly) by modifying the max_tokens key (or the
context-length key) in the same YAML (referencing the max_tokens and
context-length fields) to ensure max_tokens < context-length before running
inference.
In `@recipes/qwen3.5/agg-tep4.yaml`:
- Line 51: Add a brief explanatory comment next to the context-length setting to
clarify why the non-power-of-two value “context-length: 2020” is used in the
Qwen3.5 recipes (e.g., that it’s a small/test-only context size chosen for
benchmark/test efficiency); update the YAML entry in agg-tep4.yaml by appending
a one-line comment after the context-length key to state the rationale so future
readers know it’s intentional.
---
Duplicate comments:
In `@recipes/qwen3.5/1p1d-tep4.yaml`:
- Line 35: The recipe uses the wrong env var key
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK which the docs don't recognize; replace
both occurrences of SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK with the documented
key SGLANG_ENABLE_TP_MEMORY_INBALANCE_CHECK and set its value to "0" to disable
the TP memory imbalance check (ensure you update both places where
SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK appears in the file).
---
Nitpick comments:
In `@recipes/qwen3.5/1p1d-deepep-deepgemm.yaml`:
- Around line 26-58: The prefill_environment and decode_environment blocks
duplicate many identical keys; define a shared YAML anchor (e.g., &default_env)
containing the common key/value pairs, then use YAML merge (<<: *default_env)
inside both prefill_environment and decode_environment and only list the
differing keys (like MC_TE_METRIC, SGLANG_DECODE_BOOTSTRAP_TIMEOUT,
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in decode_environment to override or add them;
update the recipe to reference the anchor so future changes apply in one place
and reduce drift between prefill_environment and decode_environment.
In `@recipes/qwen3.5/1p1d-tp4.yaml`:
- Around line 22-50: The prefill_environment and decode_environment blocks
repeat most keys; centralize the shared settings by extracting them into a
single common map (e.g., common_environment) or use YAML anchors/aliases, then
reference that from both prefill_environment and decode_environment and only
override the differing keys (like SGLANG_DECODE_BOOTSTRAP_TIMEOUT and
SGLANG_HACK_SEQ_BOOTSTRAP_ROOM) in decode_environment; update any consumers to
read merged env if needed and ensure unique symbols referenced are
prefill_environment and decode_environment so you only move duplicated keys
once.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
configs/rebuild-deepep.shrecipes/qwen3.5/1p1d-deepep-deepgemm.yamlrecipes/qwen3.5/1p1d-dep4.yamlrecipes/qwen3.5/1p1d-tep4.yamlrecipes/qwen3.5/1p1d-tp4-mtp.yamlrecipes/qwen3.5/1p1d-tp4.yamlrecipes/qwen3.5/agg-dep4.yamlrecipes/qwen3.5/agg-tep4.yamlrecipes/qwen3.5/agg-tp4.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
- recipes/qwen3.5/agg-dep4.yaml
- recipes/qwen3.5/1p1d-tp4-mtp.yaml
- recipes/qwen3.5/1p1d-dep4.yaml
- recipes/qwen3.5/agg-tp4.yaml
- configs/rebuild-deepep.sh
|
|
||
| tensor-parallel-size: 4 | ||
| data-parallel-size: 1 | ||
| expert-parallel-size: 1 |
Summary by CodeRabbit