Support directory detection in dump comparator by fzyzcjy · Pull Request #19680 · sgl-project/sglang

fzyzcjy · 2026-03-02T10:21:16Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

The set excludes DP (non-shardable), so the old name was misleading.

Previously, load failures were silently filtered out. Now each failure emits a GeneralWarning via warning_sink so it surfaces to the user through the record's warnings list.

Tests cover: all success (no warnings), partial failure (one warning emitted, good files kept), and all corrupted (empty result, one warning per file).

- Add _compute_exit_code() that returns 1 when any comparison fails - run() now returns int exit code, main() calls sys.exit() - _consume_comparison_records() returns SummaryRecord - Add --forbid-skip flag to also fail on skipped comparisons

…rn type - _run_and_parse now returns tuple[list[AnyRecord], int] - Add forbid_skip=False to _make_args defaults - Update all call sites to unpack the tuple - Add TestExitCode class with 7 unit tests + 2 integration tests

TestExitCodeSubprocess invokes comparator via subprocess.run and verifies exit code for passed, failed, skipped, and --forbid-skip.

Add ParallelModifier dataclass supporting axis:qualifier colon syntax. DimSpec now uses parallel_modifiers tuple to support multiple parallel axes on a single dimension (e.g. t(cp:zigzag,sp)).

Iterate over parallel_modifiers tuple instead of single parallel field. Unshard in reverse order (innermost shard first). _resolve_unshard_params now takes modifier + dim_name instead of full DimSpec.

Iterate over parallel_modifiers to find zigzag ordering instead of checking single spec.ordering field.

Update h(tp,partial) -> h(tp:partial), s(cp,zigzag) -> s(cp:zigzag), t(cp,zigzag) -> t(cp:zigzag), etc. across all source and test files. Also fix unsharder planner to reverse modifiers within each dim spec (not globally) for correct innermost-first unshard order.

- ParallelModifier and DimSpec now extend _FrozenBase (frozen=True, extra='forbid') instead of frozen dataclass - parallel_modifiers uses list instead of tuple - Parser uses axis:qual+qual syntax (+ separates qualifiers within one modifier, comma separates modifiers on a dim) - Rename sharded_modifiers → reversed_sharded_modifiers in planner - Collapse modifier collection to one-line comprehension

Replace `if x not in dict` + `dict[x]` pattern with `dict.get(x)` + `if ... is None` for both axis and qualifier lookups.

Add unsharder planner tests verifying SP unshards first (inner) then CP (outer) for same-dim multi-axis. Add E2E round-trip test with CP=2 zigzag + SP=2 on token dim covering full unshard + reorder pipeline.

Add test_cp_zigzag_sp_same_dim_unshard covering full pipeline: dump creation → metadata loading → aligner plan → SP unshard + CP unshard + zigzag reorder → tensor comparison. Also add helper _create_cp_zigzag_sp_sharded_dumps.

The 't' dim zigzag reorder requires thd_global_seq_lens (derived from cu_seqlens_q). Reworked helper to do per-sequence zigzag split and include cu_seqlens_q dumps.

THD zigzag reorder requires thd_global_seq_lens derived from cu_seqlens_q, which only gets loaded via the smart token aligner path. Add input_ids dumps alongside cu_seqlens_q and switch to token_aligner="smart".

The t dim requires THD aux handling (cu_seqlens_q, input_ids) which adds complexity. Use b s(cp:zigzag,sp) h to test the same multi-axis same-dim unshard + reorder flow without THD aux dependencies.

Collect moe_dp_rank/size and attn_cp_rank/size in _SGLangPlugin.collect_parallel_info() using existing SGLang APIs that were previously not being dumped.

In dp_attn mode, dp_size > 1 but MLP tensors have data on all ranks. The existing dp filter would incorrectly drop valid ranks. This adds a `// dp:=<group_name>` syntax in dims strings that redirects dp filtering to use `<group>_rank`/`<group>_size` fields from metadata instead of the default `dp_rank`/`dp_size`. - dims.py: parse_dims() strips `//` section; new extract_dp_group_alias() - dp_utils.py: filter/extract accept dp_group_alias parameter - bundle_comparator.py: swap dims override before dp filter, pass alias

- test_dims.py: extract_dp_group_alias, parse_dims with //, resolve_dim_names with // - test_dp_utils.py: _extract_dp_info and filter with dp_group_alias param - test_entrypoint.py: E2E tests for dp alias noop, override-dims alias, real alias filtering

Assert that sglang_parallel_info in dump metadata contains the newly added moe_dp_rank/size and attn_cp_rank/size fields alongside the existing tp_rank/size.

Check every group (tp, pp, moe_ep, moe_tp, moe_dp, attn_tp, attn_dp, local_attn_dp, attn_cp, enable_dp_attention) rather than only the newly added ones.

…xxx directly Only keep local vars for derived values (dir_pair, viz_output_dir, etc.) that need Path() conversion or conditional logic.

- >=2 subdirs with .pt → ValueError with clear message - 0 subdirs with .pt (and no .pt at root) → ValueError - Add test for single non-empty subdir among empty siblings

auto_descend_dir emits log via log_sink → report_sink, so the output format must be set before paths are resolved. Call configure() early with format only, then again with the resolved report_path.

The test was checking stderr for a [comparator] prefix that never existed. Now checks for auto-descend info in LogRecord output.

axis_swapper uses warning_sink to report dim name mismatches, but the module and type were missing.

The new dims_spec parser supports modifier syntax like h(tp).

dims_spec parser uses square brackets for modifiers.

h(tp:partial) -> h[tp:partial], s(cp) -> s[cp]

…se_dims().dims in reduce tests

… fix parse_dims().dims in reduce tests" This reverts commit d5480c8.

…tor" This reverts commit 28afed6.

…r test" This reverts commit 3a8205d.

This reverts commit 44da934.

This reverts commit 220c379.

gemini-code-assist · 2026-03-02T10:23:16Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors and enhances the dump comparator's ability to handle complex tensor layouts and distributed training configurations. Key changes include a new dimension specification system that supports fused dimensions and explicit replicated axes, improved axis alignment, and a more robust logging mechanism. These updates aim to make the comparator more flexible, accurate, and user-friendly for debugging diverse model architectures and parallelization strategies.

Highlights

Dimension Specification Refactoring: Introduced a new dims_spec module to provide a more robust and flexible way to define tensor dimensions, including support for fused dimensions (e.g., (num_heads*head_dim)) and explicit declaration of replicated parallel axes. This replaces the older dims.py module.
Enhanced Axis Alignment Logic: The axis_aligner now leverages the new dims_spec to handle complex tensor layouts, including fused dimensions, ensuring correct alignment and rearrangement for comparison. The axis_swapper module was removed, with its functionality integrated into the more general axis_aligner.
Improved Data Parallel (DP) Group Handling: The dp_utils module was updated to support aliased DP groups, allowing for more granular control and filtering of data parallel ranks during comparison.
Centralized Logging and Error Reporting: Replaced the warning_sink with a new log_sink system, which distinguishes between ErrorLog and InfoLog messages. This provides a clearer and more structured way to report issues and informational messages during the comparison process.
Directory Auto-Detection and CLI Presets: The comparator now includes automatic directory detection to simplify specifying input paths. Additionally, CLI argument presets have been introduced to streamline common comparison configurations.
Expanded Dumper Parallel Information: The dumper now collects more comprehensive parallel information, including MoE data parallel ranks/sizes (moe_dp_rank, moe_dp_size) and attention data parallel ranks/sizes (attn_cp_rank, attn_cp_size), enhancing debuggability for complex distributed setups.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/srt/debug_utils/comparator/init.py
- Updated import from ComparisonRecord to TensorComparisonRecord.
python/sglang/srt/debug_utils/comparator/aligner/axis_aligner.py
- Replaced dims module import with dims_spec module.
- Removed warning_sink import and replaced its usage with log_sink and ErrorLog.
- Refactored axis alignment logic to support DimSpec objects, including handling of fused dimensions and canonical ordering.
- Added _semantic_names_match, _expand_and_skip_squeeze, _build_canonical_order, and _build_side_pattern functions for improved dimension matching and pattern generation.
- Added validation for side argument in execute_axis_aligner_plan.
python/sglang/srt/debug_utils/comparator/aligner/axis_swapper.py
- Removed file, as its functionality was integrated into axis_aligner.py.
python/sglang/srt/debug_utils/comparator/aligner/entrypoint/planner.py
- Updated imports from dims to dims_spec.
- Modified compute_per_step_sub_plans to accept explicit_replicated_axes from the new DimsSpec object.
python/sglang/srt/debug_utils/comparator/aligner/reorderer/executor.py
- Updated imports from dims to dims_spec.
python/sglang/srt/debug_utils/comparator/aligner/reorderer/planner.py
- Updated imports from dims to dims_spec.
python/sglang/srt/debug_utils/comparator/aligner/token_aligner/concat_steps/executor.py
- Updated imports from dims to dims_spec.
python/sglang/srt/debug_utils/comparator/aligner/token_aligner/entrypoint.py
- Removed argparse import.
- Replaced GeneralWarning with InfoLog and warning_sink with log_sink.
- Modified compute_maybe_token_aligner_result and _build_smart_result to accept dir_pair and token_aligner_mode directly, removing reliance on args namespace for paths.
python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/aux_loader.py
- Updated imports from dims to dims_spec.
- Replaced GeneralWarning with ErrorLog and InfoLog, and warning_sink with log_sink.
python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/aux_plugins.py
- Updated imports from dims to dims_spec.
- Replaced GeneralWarning with InfoLog and warning_sink with log_sink.
- Updated infer_cp_sharded_dims to use bracket notation [] instead of parentheses () for parallel modifiers.
python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/executor.py
- Updated imports from dims to dims_spec.
python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/types.py
- Updated imports from dims to dims_spec.
python/sglang/srt/debug_utils/comparator/aligner/unsharder/executor.py
- Updated imports from dims to dims_spec.
- Refactored _verify_replicated_group into _check_replicated_pair for clearer logic and added handling for shape mismatches.
python/sglang/srt/debug_utils/comparator/aligner/unsharder/parallel_info.py
- Updated imports from dims to dims_spec.
python/sglang/srt/debug_utils/comparator/aligner/unsharder/planner.py
- Updated imports from dims to dims_spec.
- Modified compute_unsharder_plan to accept explicit_replicated_axes and added _validate_explicit_replicated for robust validation of replicated axes.
python/sglang/srt/debug_utils/comparator/aligner/unsharder/types.py
- Updated imports from dims to dims_spec.
python/sglang/srt/debug_utils/comparator/bundle_comparator.py
- Updated imports from dims to dims_spec and from old comparison record types to new TensorComparisonRecord, SkipComparisonRecord, NonTensorComparisonRecord.
- Replaced GeneralWarning with ErrorLog and log_sink for error reporting.
- Modified compare_bundle_pair and _compare_bundle_pair_inner to accept dir_pair instead of separate baseline_path and target_path.
- Updated DP filtering to use dp_group_alias extracted from dims_spec.
- Added _extract_dp_alias_from_items helper function.
python/sglang/srt/debug_utils/comparator/dims.py
- Removed file, replaced by the dims_spec package.
python/sglang/srt/debug_utils/comparator/dims_spec/init.py
- Added new initialization file for the dims_spec package, exporting all its components.
python/sglang/srt/debug_utils/comparator/dims_spec/comment_parser.py
- Added new file to parse comment suffixes in dimension strings for DP aliases and replicated axis declarations.
python/sglang/srt/debug_utils/comparator/dims_spec/dim_parser.py
- Added new file to parse individual dimension tokens, including support for fused dimensions like (a*b).
python/sglang/srt/debug_utils/comparator/dims_spec/dims_parser.py
- Added new file to parse full dimension strings, integrating dim_parser and comment_parser, and handling semantic name validation.
python/sglang/srt/debug_utils/comparator/dims_spec/modifier_parser.py
- Added new file to parse parallel modifiers within dimension specifications, supporting axis:qualifier syntax.
python/sglang/srt/debug_utils/comparator/dims_spec/tensor_naming.py
- Added new file containing utilities for tensor naming, including find_dim_index, resolve_dim_by_name, apply_dim_names, and strip_dim_names.
python/sglang/srt/debug_utils/comparator/dims_spec/types.py
- Added new file defining core types for dimension specifications, including DimSpec, DimsSpec, ParallelAxis, Ordering, Reduction, and ParallelModifier.
python/sglang/srt/debug_utils/comparator/dp_utils.py
- Modified filter_to_non_empty_dp_rank and _extract_dp_info to accept an optional dp_group_alias for filtering based on specific data parallel groups.
python/sglang/srt/debug_utils/comparator/entrypoint.py
- Removed re import.
- Updated ConfigRecord creation to use vars(args) directly.
- Modified _read_df and _maybe_load_tokenizer to accept dir_pair for consistent path handling.
- Removed _compute_skip_keys and updated match_bundles to use _DEFAULT_SKIP_KEYS and args.grouping_skip_keys.
- Added auto_descend_dir and compute_exit_code imports and usage.
- Updated _consume_comparison_records to return failed_names for exit code computation.
- Modified _resolve_report_path to use target_path directly.
- Introduced parse_args to handle CLI argument presets via expand_preset.
python/sglang/srt/debug_utils/comparator/log_sink.py
- Added new file implementing a centralized LogSink for collecting and reporting ErrorLog and InfoLog messages.
python/sglang/srt/debug_utils/comparator/output_types.py
- Refactored warning types into BaseLog, ErrorLog, InfoLog, and AnyLog.
- Introduced _split_logs helper function to separate errors and infos.
- Added RecordLocation for associating comparison records with specific steps.
- Renamed SkipRecord to SkipComparisonRecord, ComparisonRecord to TensorComparisonRecord, and NonTensorRecord to NonTensorComparisonRecord.
- Updated all comparison record types to include errors and infos lists instead of a single warnings list.
- Modified ReplicatedCheckResult to make diff optional.
python/sglang/srt/debug_utils/comparator/per_token_visualizer.py
- Updated ComparisonRecord to TensorComparisonRecord in function signatures.
python/sglang/srt/debug_utils/comparator/preset.py
- Added new file defining CLI argument presets (raw, sglang_dev, sglang_megatron) and logic for expanding them.
python/sglang/srt/debug_utils/comparator/tensor_comparator/formatter.py
- Modified format_replicated_checks to handle cases where diff might be None in ReplicatedCheckResult.
python/sglang/srt/debug_utils/comparator/utils.py
- Added auto_descend_dir function for automatically navigating into subdirectories containing .pt files.
- Added compute_exit_code function for determining the CLI exit status based on comparison results and allowed patterns.
- Added _is_all_match_pattern helper for regex matching.
python/sglang/srt/debug_utils/comparator/warning_sink.py
- Removed file, replaced by log_sink.py.
python/sglang/srt/debug_utils/dumper.py
- Added moe_dp_rank, moe_dp_size, attn_cp_rank, and attn_cp_size to the collected parallel information.
test/registered/debug_utils/comparator/aligner/entrypoint/test_executor.py
- Updated dimension string syntax from () to [] for parallel modifiers.
test/registered/debug_utils/comparator/aligner/entrypoint/test_planner.py
- Updated dimension string syntax from () to [] for parallel modifiers.
- Updated compute_per_step_sub_plans calls to pass dims from DimsSpec objects.
test/registered/debug_utils/comparator/aligner/reorderer/test_executor.py
- Updated imports from dims to dims_spec.
test/registered/debug_utils/comparator/aligner/reorderer/test_planner.py
- Updated imports from dims to dims_spec.
- Updated dimension string syntax from () to [] for parallel modifiers.
- Updated parse_dims calls to access dims attribute of the returned DimsSpec object.
test/registered/debug_utils/comparator/aligner/test_axis_aligner.py
- Replaced warning_sink with log_sink.
- Updated dimension string syntax from () to [] for parallel modifiers.
- Added new test cases for TestComputeAxisAlignerPlanFused to cover fused dimensions.
- Added new test cases for TestExecuteAxisAlignerPlanFlatten to cover flattening operations.
- Added new end-to-end tests for fused alignment scenarios.
test/registered/debug_utils/comparator/aligner/test_axis_swapper.py
- Removed file, as axis_swapper.py was removed.
test/registered/debug_utils/comparator/aligner/token_aligner/test_aux_loader.py
- Replaced WarningSink with LogSink and GeneralWarning with ErrorLog/InfoLog.
- Updated patch paths for log_sink.
- Updated dimension string syntax from () to [] for parallel modifiers.
test/registered/debug_utils/comparator/aligner/token_aligner/test_aux_plugins.py
- Updated imports from dims to dims_spec.
- Updated dimension string syntax from () to [] for parallel modifiers.
test/registered/debug_utils/comparator/aligner/token_aligner/test_executor.py
- Updated imports from dims to dims_spec.
test/registered/debug_utils/comparator/aligner/token_aligner/test_planner.py
- Updated imports from dims to dims_spec.
test/registered/debug_utils/comparator/aligner/unsharder/test_executor.py
- Updated imports from dims to dims_spec.
- Updated _name_tensors to use sanitized_name from DimSpec.
- Updated dimension string syntax from () to [] for parallel modifiers.
- Updated parse_dims calls to access dims attribute of the returned DimsSpec object.
- Added explicit_replicated_axes to compute_unsharder_plan calls where applicable.
- Added new test cases for TestThdCpConcat and TestFusedDimExecutor.
test/registered/debug_utils/comparator/aligner/unsharder/test_parallel_info.py
- Updated imports from dims to dims_spec.
test/registered/debug_utils/comparator/aligner/unsharder/test_planner.py
- Updated imports from dims to dims_spec.
- Updated dimension string syntax from () to [] for parallel modifiers.
- Updated parse_dims calls to access dims attribute of the returned DimsSpec object.
- Added explicit_replicated_axes to compute_unsharder_plan calls where applicable.
- Added new test class TestExplicitReplicatedAxes and TestComputeUnsharderPlanFusedDims to cover new functionality.
test/registered/debug_utils/comparator/dims_spec/test_dim_parser.py
- Added new test file for dim_parser.py, covering parsing of plain, modified, and fused dimensions.
test/registered/debug_utils/comparator/dims_spec/test_dims_parser.py
- Added new test file for dims_parser.py, covering parsing of full dimension strings, singleton dim utilities, and hash-comment declarations.
test/registered/debug_utils/comparator/dims_spec/test_tensor_naming.py
- Added new test file for tensor_naming.py, covering find_dim_index, resolve_dim_by_name, apply_dim_names, and strip_dim_names.
test/registered/debug_utils/comparator/dims_spec/test_types.py
- Added new test file for types.py, covering basic dimension constants.
test/registered/debug_utils/comparator/tensor_comparator/test_formatter.py
- Removed ✅ prefix from max_abs_diff and mean_abs_diff in formatted output, as only rel_diff indicates overall pass/fail.
test/registered/debug_utils/comparator/tensor_comparator/test_types.py
- Updated imports from old comparison record types to new TensorComparisonRecord, SkipComparisonRecord, NonTensorComparisonRecord, LogRecord, ErrorLog, InfoLog.
- Updated test cases to reflect new logging and comparison record types.
test/registered/debug_utils/comparator/test_bundle_comparator.py
- Replaced WarningSink with LogSink and GeneralWarning with ErrorLog.
- Updated patch paths for log_sink.
test/registered/debug_utils/comparator/test_dims.py
- Removed file, replaced by test_dim_parser.py, test_dims_parser.py, test_tensor_naming.py, and test_types.py in dims_spec.
test/registered/debug_utils/comparator/test_dp_utils.py
- Added new test classes TestExtractDpInfoWithAlias and TestFilterToNonEmptyDpRankWithAlias to verify functionality with dp_group_alias.
test/registered/debug_utils/comparator/test_log_sink.py
- Renamed from test_warning_sink.py.
- Updated to test LogSink with ErrorLog and InfoLog.
test/registered/debug_utils/comparator/test_manually_verify.py
- Updated ComparisonRecord to TensorComparisonRecord in imports and usage.
test/registered/debug_utils/comparator/test_meta_overrider.py
- Updated dimension string syntax from () to [] for parallel modifiers.
test/registered/debug_utils/comparator/test_model_validation.py
- Updated imports from old comparison record types to new TensorComparisonRecord, SkipComparisonRecord, NonTensorComparisonRecord.
- Replaced GeneralWarning with ErrorLog.
- Updated _make_comparison_record to accept errors instead of warnings.
test/registered/debug_utils/comparator/test_output_types.py
- Added new test file for output_types.py, covering _split_logs and LogRecord text formatting.
test/registered/debug_utils/comparator/test_per_token_visualizer.py
- Updated ComparisonRecord to TensorComparisonRecord in imports and usage.
test/registered/debug_utils/comparator/test_preset.py
- Added new test file for preset.py, covering preset expansion logic.
test/registered/debug_utils/comparator/test_tensor_comparator.py
- Updated ComparisonRecord to TensorComparisonRecord in imports and usage.
test/registered/debug_utils/comparator/test_utils.py
- Added auto_descend_dir and compute_exit_code imports.
- Added new test class TestComputeExitCode to thoroughly test exit code logic with various pass/fail/skip scenarios and patterns.
- Added new test class TestAutoDescendDir to verify directory auto-detection.
test/registered/debug_utils/source_patcher/test_code_patcher.py
- Updated CI registration to include nightly=True.
test/registered/debug_utils/source_patcher/test_dumper_integration.py
- Updated CI registration to include nightly=True.
test/registered/debug_utils/source_patcher/test_source_editor.py
- Updated CI registration to include nightly=True.
test/registered/debug_utils/test_dumper.py
- Added assertions to test_step_and_non_intrusive_hooks to verify the presence of new parallel info keys (moe_dp_rank, moe_dp_size, attn_cp_rank, attn_cp_size) in the dumped metadata.

Activity

No specific human activity (comments, reviews) was provided in the pull request description.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant and high-quality refactoring of the dump comparator utility. The changes enhance its functionality, robustness, and user experience. Key improvements include a new, more powerful dims_spec parser that supports fused dimensions and explicit replication declarations, automatic discovery of dump directories, a more flexible command-line interface with presets, and improved logging and error handling. The code quality is excellent, and the changes are well-tested. I have not found any issues or bugs in this pull request.

# Conflicts: # python/sglang/srt/debug_utils/comparator/aligner/axis_swapper.py

fzyzcjy added 30 commits February 28, 2026 11:19

dp-attn v2: rename all_axes to shardable_axes

b33549c

The set excludes DP (non-shardable), so the old name was misleading.

more

37c6352

dp-attn v2: always return parallel_state in colon modifier parsing

6f40c6a

revert dp attn

fd17dee

emit warning when tensor file load fails in _load_all_values

22192a9

Previously, load failures were silently filtered out. Now each failure emits a GeneralWarning via warning_sink so it surfaces to the user through the record's warnings list.

add tests for _load_all_values load failure warnings

ef822d5

Tests cover: all success (no warnings), partial failure (one warning emitted, good files kept), and all corrupted (empty result, one warning per file).

more

d43caad

comparator: add subprocess e2e tests for exit code behavior

8752c48

TestExitCodeSubprocess invokes comparator via subprocess.run and verifies exit code for passed, failed, skipped, and --forbid-skip.

refactor: replace DimSpec single-axis fields with ParallelModifier tuple

545d567

Add ParallelModifier dataclass supporting axis:qualifier colon syntax. DimSpec now uses parallel_modifiers tuple to support multiple parallel axes on a single dimension (e.g. t(cp:zigzag,sp)).

refactor: update unsharder planner for multi-modifier DimSpec

c9a81b1

Iterate over parallel_modifiers tuple instead of single parallel field. Unshard in reverse order (innermost shard first). _resolve_unshard_params now takes modifier + dim_name instead of full DimSpec.

refactor: update reorderer planner for multi-modifier DimSpec

7bb059a

Iterate over parallel_modifiers to find zigzag ordering instead of checking single spec.ordering field.

style: auto-fix formatting from ruff and black

85ad1c9

cleanup: remove unused _ORDERING_LOOKUP and _REDUCTION_LOOKUP

09b7499

style: auto-fix black formatting in unsharder planner

0ec2847

refactor: use .get() lookup in _parse_modifier_token

b6f6cdc

Replace `if x not in dict` + `dict[x]` pattern with `dict.get(x)` + `if ... is None` for both axis and qualifier lookups.

test: add multi-axis same-dim tests for t(cp:zigzag,sp)

315ffb7

Add unsharder planner tests verifying SP unshards first (inner) then CP (outer) for same-dim multi-axis. Add E2E round-trip test with CP=2 zigzag + SP=2 on token dim covering full unshard + reorder pipeline.

fix: add cu_seqlens_q + per-seq zigzag to cp_zigzag_sp test helper

ca124fe

The 't' dim zigzag reorder requires thd_global_seq_lens (derived from cu_seqlens_q). Reworked helper to do per-sequence zigzag split and include cu_seqlens_q dumps.

fix: remove stale token_dim kwarg from test call

3e7374d

fix: use smart token aligner for cp_zigzag_sp entrypoint test

70fd0c0

THD zigzag reorder requires thd_global_seq_lens derived from cu_seqlens_q, which only gets loaded via the smart token aligner path. Add input_ids dumps alongside cu_seqlens_q and switch to token_aligner="smart".

fix: use s(cp:zigzag,sp) seq dim instead of t dim for entrypoint test

ec9181c

The t dim requires THD aux handling (cu_seqlens_q, input_ids) which adds complexity. Use b s(cp:zigzag,sp) h to test the same multi-axis same-dim unshard + reorder flow without THD aux dependencies.

add moe_dp and attn_cp to SGLang parallel info collection

bff7fa5

Collect moe_dp_rank/size and attn_cp_rank/size in _SGLangPlugin.collect_parallel_info() using existing SGLang APIs that were previously not being dumped.

test: verify moe_dp and attn_cp fields in E2E parallel_info dump

5d569aa

Assert that sglang_parallel_info in dump metadata contains the newly added moe_dp_rank/size and attn_cp_rank/size fields alongside the existing tp_rank/size.

test: verify all sglang_parallel_info fields in E2E dump

b68dcca

Check every group (tp, pp, moe_ep, moe_tp, moe_dp, attn_tp, attn_dp, local_attn_dp, attn_cp, enable_dp_attention) rather than only the newly added ones.

fzyzcjy added 23 commits March 1, 2026 11:35

refactor: remove intermediate variable extraction in run(), use args.…

dfd8ee1

…xxx directly Only keep local vars for derived values (dir_pair, viz_output_dir, etc.) that need Path() conversion or conditional logic.

feat: auto_descend_dir errors on ambiguous/missing data, add tests

0efb391

- >=2 subdirs with .pt → ValueError with clear message - 0 subdirs with .pt (and no .pt at root) → ValueError - Add test for single non-empty subdir among empty siblings

fix: configure report_sink output_format before auto_descend_dir

4ed6412

auto_descend_dir emits log via log_sink → report_sink, so the output format must be set before paths are resolved. Call configure() early with format only, then again with the resolved report_path.

fix: update auto-descend test to check LogRecord in stdout JSON

1bade5c

The test was checking stderr for a [comparator] prefix that never existed. Now checks for auto-descend info in LogRecord output.

feat: add GeneralWarning type and warning_sink module

220c379

axis_swapper uses warning_sink to report dim name mismatches, but the module and type were missing.

fix: use dims_spec instead of legacy dims in axis_swapper

44da934

The new dims_spec parser supports modifier syntax like h(tp).

fix: use bracket syntax h[tp] instead of h(tp) in axis_swapper test

3a8205d

dims_spec parser uses square brackets for modifiers.

fix: use bracket syntax for modifiers in unsharder test_executor

28afed6

h(tp:partial) -> h[tp:partial], s(cp) -> s[cp]

fix: add pass/fail markers for all diff metrics in formatter, fix par…

d5480c8

…se_dims().dims in reduce tests

Revert "fix: add pass/fail markers for all diff metrics in formatter,…

c3aba53

… fix parse_dims().dims in reduce tests" This reverts commit d5480c8.

Revert "fix: use bracket syntax for modifiers in unsharder test_execu…

b36b7b9

…tor" This reverts commit 28afed6.

Revert "fix: use bracket syntax h[tp] instead of h(tp) in axis_swappe…

ebc7797

…r test" This reverts commit 3a8205d.

Revert "fix: use dims_spec instead of legacy dims in axis_swapper"

44f9030

This reverts commit 44da934.

Revert "feat: add GeneralWarning type and warning_sink module"

9cc7c3c

This reverts commit 220c379.

fix worngly introduced file in merging

2fe4d8b

fix: use bracket syntax and .dims in ReduceSum unsharder tests

d6ec404

style: reformat preset.py (black)

a01d2d9

Merge branch 'ac8420/29' into ac8420/30

9c4bba6

style: reformat with black

1c86491

Merge branch 'ac8420/30' into ac8420/31

4f99384

Merge branch 'ac8420/31' into ac8420/32

95b963b

Merge branch 'ac8420/32' into ac8420/33

8fafde2

style: reformat test_entrypoint.py (black)

d5046ad

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into ac8420/33

cbec215

# Conflicts: # python/sglang/srt/debug_utils/comparator/aligner/axis_swapper.py

fzyzcjy merged commit abdc0ee into sgl-project:main Mar 2, 2026
57 of 66 checks passed

Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026

Support directory detection in dump comparator (sgl-project#19680)

341edf5

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

Support directory detection in dump comparator (sgl-project#19680)

0f17c7b

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

Support directory detection in dump comparator (sgl-project#19680)

f3eff24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support directory detection in dump comparator#19680

Support directory detection in dump comparator#19680
fzyzcjy merged 573 commits intosgl-project:mainfrom
fzyzcjy:ac8420/33

fzyzcjy commented Mar 2, 2026

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fzyzcjy commented Mar 2, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant