Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThe changes extend log file parsing across the analysis module to support both Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
recipies/gb300-fp4/cutedsl-max-tpt.yaml (1)
60-60: Inconsistent environment variable key with sibling recipe file.Line 60 in this file uses the old key
SGLANG_CUTEDSL_MOE_NVFP4_DISPATCH: "1", while the related recipe filerecipies/gb300-fp4/cutlass-max-tpt.yamlwas updated to use the new keySGLANG_MOE_NVFP4_DISPATCH: "true"on lines 26 and 63. Both recipes are for the same compute type (gb300-fp4) and should have consistent environment variable names.Update line 60 to match the renamed key and boolean format:
- SGLANG_CUTEDSL_MOE_NVFP4_DISPATCH: "1" + SGLANG_MOE_NVFP4_DISPATCH: "true"Also verify that the system/application code reading these environment variables has been updated to use the new key name across all affected recipe files.
recipies/gb300-fp4/cutlass-max-tpt.yaml (1)
1-5: Update header comment to reflect renamed environment variable.The comment on line 5 still references the old environment variable name
SGLANG_CUTEDSL_MOE_NVFP4_DISPATCH, but the actual configuration now usesSGLANG_MOE_NVFP4_DISPATCH. Keep the header comments in sync with the actual configuration.Update line 5 in the header comment:
-# - setting SGLANG_CUTEDSL_MOE_NVFP4_DISPATCH: 'true' +# - setting SGLANG_MOE_NVFP4_DISPATCH: 'true'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
recipies/gb300-fp4/cutedsl-max-tpt.yaml(1 hunks)recipies/gb300-fp4/cutlass-max-tpt.yaml(3 hunks)
🔇 Additional comments (1)
recipies/gb300-fp4/cutedsl-max-tpt.yaml (1)
7-7: Verify container image format is valid and resolvable.The container image reference was simplified from a fully-qualified Docker URL (
docker://ishandhanani/sglang:rt-cu13-14483) to a bare tag (rt-cu13-14483). Without an explicit registry prefix or organization namespace, image resolution may fail depending on how the parsing/registry logic interprets this format.Confirm:
- Does the image resolution logic support bare tags without registry prefixes?
- Is
rt-cu13-14483available in the expected image registry (e.g., Docker Hub)?- Should this be
docker://rt-cu13-14483orishandhanani/sglang:rt-cu13-14483instead?Consider restoring the full image reference unless you've verified the bare format works with your system.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
recipies/gb300-fp4/cutedsl-max-tpt.yaml (1)
1-1: Update comment to reflect new decode node count.The file description mentions "8 Decode Nodes," but this configuration now scales to 12 decode nodes (line 13). Update the comment to accurately reflect the current topology.
Apply this diff to fix the comment:
-# 4P1D, with 8 Decode Nodes. Better per-gpu throughput with worse latency. +# 4P1D, with 12 Decode Nodes. Better per-gpu throughput with worse latency.Also applies to: 13-13
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
recipies/gb300-fp4/cutedsl-max-tpt.yaml(3 hunks)recipies/gb300-fp4/max-tpt-no-sbo.yaml(5 hunks)
🔇 Additional comments (8)
recipies/gb300-fp4/max-tpt-no-sbo.yaml (4)
3-3: Clean GPU hardware migration from GB300 to GB200.Name and gpu_type are now consistently aligned. Configuration reflects the target hardware change.
Also applies to: 11-11
7-7: Verify container version compatibility across configuration files.This file uses container version
0.5.5.post2, while the related configuration file usesrt-cu13-14483. Confirm these are compatible or intended to be different versions for different deployment scenarios.
37-37: MOE NVFP4 dispatch environment variables added consistently.Both prefill and decode environments now include
SGLANG_MOE_NVFP4_DISPATCH: "1", aligning with the migration away from CUTEDSL-specific dispatch handling.Also applies to: 59-59
125-125: Verify backend changes are compatible with container version.The decode configuration switches to
flashinfer_cutedslfor MOE kernel execution and usesdeepepwithlow_latencymode for expert-to-expert communication. Confirm these backends are supported by container version0.5.5.post2.Also applies to: 152-153
recipies/gb300-fp4/cutedsl-max-tpt.yaml (4)
7-7: Verify container image reference format consistency.This file uses container reference
rt-cu13-14483(tag format), while the related configuration uses0.5.5.post2(version format). Confirm both resolve to the same or appropriate images, or clarify if intentional differences exist for different deployment paths.
12-14: Verify resource scaling does not exceed cluster capacity.The configuration scales aggressively: prefill nodes 1→4, decode nodes 1→12, prefill workers 1→4. Confirm these allocations are compatible with cluster resources and that the 12x decode node scaling aligns with the 12x parallelism increase (ep-size: 4→48, below).
22-22: MOE NVFP4 dispatch environment variables added consistently.Both prefill and decode environments now include
SGLANG_MOE_NVFP4_DISPATCH: "1", aligning with File 1 and the unified MOE dispatch strategy.Also applies to: 61-61
172-174: Verify parallelism scaling aligns with resource topology.The decode configuration scales parallelism 12x (tp/dp/ep: 4→48) to match decode node scaling. Confirm this allocation is mathematically compatible with the hardware topology (12 decode nodes × 4 GPUs/node = 48 units) and that memory/communication overhead is acceptable at this scale.
| precision: "fp4" | ||
|
|
||
| resources: | ||
| gpu_type: "gb300" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n recipies/gb300-fp4/cutedsl-max-tpt.yaml | head -20Repository: ishandhanani/srt-slurm
Length of output: 565
🏁 Script executed:
ls -la recipies/gb300-fp4/ | grep -E "\.yaml$"Repository: ishandhanani/srt-slurm
Length of output: 263
🏁 Script executed:
cat -n recipies/gb300-fp4/cutedsl-max-tpt-no-sbo.yaml | head -20Repository: ishandhanani/srt-slurm
Length of output: 144
🏁 Script executed:
cat -n recipies/gb300-fp4/max-tpt-no-sbo.yaml | head -20Repository: ishandhanani/srt-slurm
Length of output: 562
🏁 Script executed:
cat -n recipies/gb300-fp4/1p2d.yaml | head -20Repository: ishandhanani/srt-slurm
Length of output: 534
Fix GPU type to match configuration name.
The configuration is named gb200-fp4-max-tpt-2 but gpu_type is set to gb300. The related configuration file max-tpt-no-sbo.yaml has the same name with gpu_type: "gb200". Update to align:
resources:
- gpu_type: "gb300"
+ gpu_type: "gb200"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| gpu_type: "gb300" | |
| gpu_type: "gb200" |
🤖 Prompt for AI Agents
In recipies/gb300-fp4/cutedsl-max-tpt.yaml around line 11, the gpu_type is
incorrectly set to "gb300" while the configuration name is gb200-fp4-max-tpt-2;
change gpu_type to "gb200" to match the configuration name and make it
consistent with max-tpt-no-sbo.yaml which uses gpu_type: "gb200".
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
recipies/gb300-fp4/cutedsl-max-tpt.yaml (1)
1-1: Comment/configuration mismatch: "8 Decode Nodes" vs. actual value 12.Line 1 states "8 Decode Nodes" but Line 13 sets
decode_nodes: 12. Update the comment to reflect the actual configuration.-# 4P1D, with 8 Decode Nodes. Better per-gpu throughput with worse latency. +# 4P1D, with 12 Decode Nodes. Better per-gpu throughput with worse latency.Also applies to: 13-13
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
recipies/gb300-fp4/cutedsl-max-tpt.yaml(3 hunks)recipies/gb300-fp4/max-tpt-no-sbo.yaml(5 hunks)
🔇 Additional comments (6)
recipies/gb300-fp4/cutedsl-max-tpt.yaml (3)
7-7: Verify container reference format.The container reference was simplified from a full Docker URI to
"rt-cu13-14483". Confirm this format is valid and resolves correctly in your deployment environment. Ensure this shortened reference maintains the same semantics as the originaldocker://ishandhanani/sglang:rt-cu13-14483or update to the appropriate full reference if needed.
12-14: Verify parallelism settings scale correctly with resource allocation.Resource scaling increases decode nodes from 1 to 12 (48 total GPUs: 12 nodes × 4 GPUs/node), while parallelism settings are updated to
tp-size: 48, dp-size: 48, ep-size: 48in both prefill (tp/dp/ep from 4) and decode sections. Confirm these parallelism values are correct and compatible with the new resource allocation. A tensor-parallelism of 48 would span all decode GPUs—verify this aligns with your distributed inference topology and MOE expert placement strategy.Also applies to: 172-174
22-22: MOE dispatch environment variable standardization looks good.The environment variable naming has been standardized to
SGLANG_MOE_NVFP4_DISPATCH: "1"in both prefill (line 22) and decode (line 61) sections. This is a clear, consistent change for MOE dispatch configuration.Also applies to: 61-61
recipies/gb300-fp4/max-tpt-no-sbo.yaml (3)
7-7: Verify container reference is complete and resolvable.The container reference is set to
"0.5.5.post2", which appears to be a version string without a registry or image name. Compare this to file 1 which uses a clearer image reference format. Ensure this incomplete reference resolves correctly in your deployment environment or update to a full, valid image reference.
37-37: MOE dispatch environment standardization looks good.The
SGLANG_MOE_NVFP4_DISPATCH: "1"variable has been added to both prefill and decode environments consistently with the changes in file 1, supporting proper MOE dispatch configuration.Also applies to: 59-59
125-125: Backend configuration updates are clear.The
moe-runner-backendis set to"flashinfer_cutedsl"and DeepEP configuration has been added withmoe-a2a-backend: "deepep"anddeepep-mode: "low_latency". Verify these backend choices are compatible with your model and inference stack.Also applies to: 152-153
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
recipies/gb300-fp4/long-ctx.yaml (1)
39-39: Remove trailing whitespace from environment variable values.Lines 39 and 56 have trailing whitespace after
"true", which can cause YAML parsing or comparison issues. Ensure consistency.prefill_environment: ... - SGLANG_ENABLE_FLASHINFER_GEMM: "true" + SGLANG_ENABLE_FLASHINFER_GEMM: "true" decode_environment: ... - SGLANG_ENABLE_FLASHINFER_GEMM: "true" + SGLANG_ENABLE_FLASHINFER_GEMM: "true"Also applies to: 56-56
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
analysis/srtlog/log_parser.py (1)
65-77: Variable naming inconsistency:total_err_filesshould reflect both file types.The variable
total_err_filesis now counting both.errand.outfiles but retains the old name. Consider renaming tototal_log_filesfor consistency with the updated messaging and behavior.- total_err_files = 0 + total_log_files = 0 parsed_successfully = 0 for file in os.listdir(run_path): if (file.endswith(".err") or file.endswith(".out")) and ("prefill" in file or "decode" in file): - total_err_files += 1 + total_log_files += 1 filepath = os.path.join(run_path, file) node = self.parse_single_log(filepath) if node: nodes.append(node) parsed_successfully += 1 - logger.info(f"Parsed {parsed_successfully}/{total_err_files} prefill/decode log files from {run_path}") + logger.info(f"Parsed {parsed_successfully}/{total_log_files} prefill/decode log files from {run_path}") - if total_err_files == 0: + if total_log_files == 0: logger.warning(f"No prefill/decode log files found in {run_path}")analysis/srtlog/config_reader.py (1)
288-298: Consider renaming function for clarity.The function
parse_command_line_from_errnow parses both.errand.outfiles. While the docstring is updated, consider renaming toparse_command_line_from_logsfor consistency with the broadened scope. This is optional since it would require updating all call sites.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
analysis/dashboard/components.py(1 hunks)analysis/dashboard/node_metrics_tab.py(1 hunks)analysis/srtlog/config_reader.py(5 hunks)analysis/srtlog/log_parser.py(11 hunks)recipies/gb300-fp4/long-ctx-low-latency.yaml(3 hunks)
🔇 Additional comments (7)
analysis/srtlog/log_parser.py (2)
470-474: Verify PP-to-TP mapping is intentional.The pipeline parallelism value (
PP) is mapped to theTPslot, which seems semantically inconsistent since pipeline parallelism and tensor parallelism are different concepts. Ensure this mapping is intentional and won't cause confusion in downstream metric analysis or visualizations.
592-606: LGTM!The filename extraction regex correctly handles both
.errand.outextensions with proper greedy matching for node names containing underscores.recipies/gb300-fp4/long-ctx-low-latency.yaml (2)
9-15: Verify prefill resource allocation.With
prefill_nodes=4,prefill_workers=4, andpipeline-parallel-size=4, ensure that each prefill worker has access to 4 GPUs for the pipeline stages. The configuration assumes 16 total prefill processes across 4 nodes.
131-136: LGTM!The benchmark configuration aligns well with the context-length settings (isl=128000 + osl=8000 = 136000 context-length).
analysis/dashboard/node_metrics_tab.py (1)
81-84: LGTM!The updated messages correctly inform users that both
.errand.outlog files are now supported, consistent with the parser changes.analysis/dashboard/components.py (1)
94-104: LGTM!The
_cache_versionparameter provides a clean mechanism for cache invalidation after parser changes. The underscore prefix correctly signals to Streamlit that this parameter participates in cache key computation.analysis/srtlog/config_reader.py (1)
343-364: LGTM!The file filtering and parsing logic correctly handles both
.errand.outfiles with appropriate greedy matching for node names containing underscores.
| # parallelism | ||
| moe-dense-tp-size: 1 | ||
| tensor-parallel-size: 4 | ||
| expert-parallel-size: 1 | ||
| data-parallel-size: 4 | ||
| enable-dp-attention: true | ||
| expert-parallel-size: 4 | ||
| pipeline-parallel-size: 1 |
There was a problem hiding this comment.
Decode parallelism configuration exceeds available GPUs.
The decode parallelism settings require 64 GPU slots (TP=4 × DP=4 × EP=4 × PP=1 = 64), but only 4 GPUs are available (decode_nodes=1 × decode_workers=1 × gpus_per_node=4 = 4). This will cause a runtime failure.
Either reduce the parallelism to fit 4 GPUs:
# parallelism
moe-dense-tp-size: 1
- tensor-parallel-size: 4
- data-parallel-size: 4
+ tensor-parallel-size: 2
+ data-parallel-size: 2
enable-dp-attention: true
- expert-parallel-size: 4
+ expert-parallel-size: 1
pipeline-parallel-size: 1Or increase decode resources to match the parallelism requirements.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # parallelism | |
| moe-dense-tp-size: 1 | |
| tensor-parallel-size: 4 | |
| expert-parallel-size: 1 | |
| data-parallel-size: 4 | |
| enable-dp-attention: true | |
| expert-parallel-size: 4 | |
| pipeline-parallel-size: 1 | |
| # parallelism | |
| moe-dense-tp-size: 1 | |
| tensor-parallel-size: 2 | |
| data-parallel-size: 2 | |
| enable-dp-attention: true | |
| expert-parallel-size: 1 | |
| pipeline-parallel-size: 1 |
🤖 Prompt for AI Agents
In recipies/gb300-fp4/long-ctx-low-latency.yaml around lines 123-129, the
product TP×DP×EP×PP (4×4×4×1=64) exceeds the available GPUs
(decode_nodes×decode_workers×gpus_per_node = 1×1×4 = 4), causing runtime
failure; fix by either reducing parallelism so TP×DP×EP×PP ≤ 4 (e.g., set
tensor-parallel-size: 1, data-parallel-size: 1, expert-parallel-size: 4,
pipeline-parallel-size: 1 or other combination that multiplies to ≤4) or
increase decode resources to match 64 GPUs (e.g., adjust decode_nodes and/or
decode_workers so decode_nodes×decode_workers×gpus_per_node = 64, such as
decode_nodes: 4 and decode_workers: 4 with gpus_per_node: 4); ensure the final
configuration satisfies total GPU slots ≥ TP×DP×EP×PP.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
recipies/gb300-fp4/long-ctx-max-tpt.yaml (2)
120-120: Resolve uncertainty in chunked-prefill-size configuration.The comment
# why disable chunked prefill if we have dp attn?signals unresolved design intent. The value is set to-1(disable), but the question suggests this choice may warrant reconsideration.Please clarify: Is disabling chunked prefill intentional, or should this be revisited? I can help refine this setting or update the comment if needed.
18-35: Consider consolidating duplicated environment variables.Both
prefill_environmentanddecode_environmentdefine identical entries:FLASHINFER_WORKSPACE_BASE,SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE,SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT,SGLANG_DISAGGREGATION_WAITING_TIMEOUT, and others.If these settings should always match, consider defining a shared environment block (e.g., via a YAML anchor) to reduce duplication and maintenance burden:
defaults: &disagg_env SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000" SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000" SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000" # ... other shared vars prefill_environment: <<: *disagg_env # prefill-specific overrides hereAlso applies to: 38-58
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
recipies/gb300-fp4/long-ctx-max-tpt.yaml(2 hunks)
🔇 Additional comments (1)
recipies/gb300-fp4/long-ctx-max-tpt.yaml (1)
92-92: Verify expert-parallel-size asymmetry between prefill and decode.The prefill config sets
expert-parallel-size: 1(line 92), while decode setsexpert-parallel-size: 32(line 131). In disaggregated MOE setups, this disparity typically reflects different parallelization strategies, but the 1 vs. 32 gap is substantial.Confirm this asymmetry is intentional. If prefill should use expert parallelism like decode, this may be a configuration oversight.
Also applies to: 131-131
| concurrencies: "1x128x512x2048x4096x8192x12000x15000" | ||
| isl: 128000 | ||
| osl: 8000 | ||
| concurrencies: "16x32x64x128x256x512x644x768" |
There was a problem hiding this comment.
Verify benchmark concurrency value 644.
The concurrency sequence "16x32x64x128x256x512x644x768" contains an anomalous value: 644 breaks the expected progression between 512 and 768. This appears to be a typo.
Clarify the intended value. If following a doubling/step pattern, consider: 16x32x64x128x256x512x576x640x704x768 or similar.
- concurrencies: "16x32x64x128x256x512x644x768"
+ concurrencies: "16x32x64x128x256x512x576x640x704x768" # or your intended values🤖 Prompt for AI Agents
In recipies/gb300-fp4/long-ctx-max-tpt.yaml around line 143, the concurrencies
sequence contains an anomalous value "644" that breaks the expected numeric
progression between 512 and 768; replace or correct that value to match the
intended progression (e.g., use a consistent stepping/doubling pattern such as
adding intermediate values like 576, 640, 704 or adjust to the intended single
value) so the concurrencies string reads a valid sequence (for example
"16x32x64x128x256x512x576x640x704x768" or your chosen correct sequence).
Summary by CodeRabbit
New Features
Updates
✏️ Tip: You can customize this high-level summary in your review settings.