Skip to content

Support multiple verbosity in dump comparator#19684

Merged
fzyzcjy merged 618 commits intosgl-project:mainfrom
fzyzcjy:ac8420/37
Mar 2, 2026
Merged

Support multiple verbosity in dump comparator#19684
fzyzcjy merged 618 commits intosgl-project:mainfrom
fzyzcjy:ac8420/37

Conversation

@fzyzcjy
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy commented Mar 2, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

- test_dp_alias_absent_group_noop: single rank with dp_size=1, verifies
  // syntax doesn't break comparison
- test_dp_alias_via_override_dims: uses moe_dp_rank/moe_dp_size fields
  so override-dims with // dp:=moe_dp triggers real alias-based filtering
parse_dims() now returns DimsSpec (dims + dp_group_alias) instead of
list[DimSpec]. This removes the separate extract_dp_group_alias()
public API and keeps dp group alias extraction as an internal detail
of the parsing step.
# is more natural as an annotation/pragma marker and avoids ambiguity
with URL fragments or division operators.
… control

Instead of all-or-nothing --forbid-skip, --allow-skip-pattern accepts a
regex: tensor names matching the pattern are allowed to skip (e.g. core
auto-dump fields like positions/seq_lens that lack dims metadata at TP>1).
Default '.*' allows all skips; '^$' forbids all (equivalent to old --forbid-skip).
Replace scattered print_record() calls with a centralized report_sink
singleton that tees output to both stdout and an auto-generated JSONL
report file. This eliminates output_format parameter threading through
the call chain.

- Add ReportSink class with configure/add/close lifecycle
- Add --report-path and --no-report CLI arguments
- Default report path: <target-path>/comparator_report.jsonl
- Remove output_format from emit_display_records, _consume_comparison_records, WarningSink
- Add TestReportOutput test class and conftest autouse fixture for isolation
- Report path now printed via report_sink.add(ReportPathRecord(...))
  so it respects json/text output format
- --no-report removed; pass --report-path '' to disable instead
(cherry picked from commit 48515db)
…_seq_lens_only

`&` and `-` are same-precedence left-associative so behavior was correct,
but explicit parentheses make the intent unambiguous.
Cover: no plugin, empty cp_sharded_names, sglang seq_lens extraction,
megatron cu_seqlens_q diff extraction, multi-step, and missing tensor.
Verifies that concat mode correctly loads thd_seq_lens and performs
zigzag→natural reorder for Megatron-format CP=2 THD tensors.
…nly to concat_steps package

- Rename token_aligner/concat.py → token_aligner/concat_steps/ package
- Rename execute_token_aligner_concat → execute_token_aligner_concat_steps
- Move load_thd_seq_lens_only from smart/aux_loader.py to concat_steps/thd_seq_lens_loader.py
  so concat-specific code no longer lives inside the smart subpackage
- Update CLI choices, defaults, and all string literals from "concat" to "concat_steps"
- Update all imports and test files accordingly
The concat_steps/__init__.py eagerly importing thd_seq_lens_loader created
a circular import chain through smart/aux_loader.py → entrypoint/executor.py.
Import from the submodule path instead.
fzyzcjy added 23 commits March 1, 2026 13:34
- Add TracedAlignerPlan wrapper types with ShapeSnapshot tracking
- Extract ReportSink to report_sink.py, add verbosity field
- Add BundleFileInfo/BundleSideInfo/ShapeSnapshot types to output_types
- Change executor return values to NamedTuples (StepPlansResult, SubPlansResult)
- Refactor TensorComparisonRecord: aligner_plan -> traced_plan, add raw_bundle_info
- Make extract_parallel_info and PARALLEL_INFO_KEYS public in display.py
- Add --verbosity CLI parameter to entrypoint
- Add testing_helpers.py with shared test factories
- Add to_rich() / _format_rich_body() stubs to _OutputRecord
Extract output_formatter.py for record-level rendering delegation,
add Rich markup formatters in tensor_comparator/formatter.py,
and wire up display.py with _render_polars_as_rich_table.
All Rich functions hardcoded to normal mode (no verbosity params).
Add Rich body tests for ConfigRecord, SkipRecord, NonTensorRecord,
SummaryRecord, and log attachment (to_rich). Add Rich table tests
for RankInfoRecord and InputIdsRecord. Add comprehensive snapshot
tests for format_comparison_rich, _format_bundle_section,
_format_plan_section_rich, _format_stats_rich, and
_format_abs_diff_percentiles_rich.
Thread verbosity parameter through the full rendering pipeline:
report_sink → output_types.to_rich() → output_formatter → formatter.
Add _format_comparison_minimal for single-line output, verbose
branches for _format_bundle_section/_format_stats_rich, and
show_detail logic (verbose OR failed) for samples/checks/percentiles.
Move output_formatter imports to top-level in output_types.py.
Move TestFormatComparisonRichMinimal before TestFormatComparisonRichNormal,
update Normal docstring to include verbosity qualifier.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the dump comparator's capabilities by introducing a robust verbosity system, a redesigned dimension specification parser, and detailed execution tracing for alignment plans. These changes aim to provide users with more insightful and configurable debugging information, making it easier to identify and resolve discrepancies in tensor outputs across different parallelization strategies. The refactor also includes a centralized logging mechanism and support for command-line presets, streamlining the comparison workflow.

Highlights

  • Enhanced Verbosity and Output Formatting: Introduced a comprehensive verbosity system (minimal, normal, verbose) for comparator output, enabling users to control the level of detail in comparison reports. This includes rich console output with color-coding and structured information, and a new output_formatter.py module to manage rendering logic.
  • Refactored Dimension Specification Parsing: Overhauled the dimension specification system by replacing the dims.py module with a new dims_spec package. This new system supports more advanced features like fused dimensions (e.g., (a*b)), explicit declaration of replicated parallel axes (e.g., # tp:replicated), and improved parsing for parallel modifiers.
  • Detailed Alignment Plan Tracing: Implemented a tracing mechanism within the alignment process, capturing ShapeSnapshots at each sub-plan execution step. This allows for a detailed visualization of how tensor shapes change throughout the unshard, reorder, and alignment operations, aiding in debugging complex parallelization scenarios.
  • Centralized Logging System: Replaced the warning_sink with a new log_sink module, which centralizes the collection and reporting of ErrorLogs and InfoLogs. This provides a more structured approach to capturing non-critical information and errors during comparison, improving report clarity.
  • Improved Parallelism Handling: Enhanced the dp_utils module to support custom data parallel group aliases (e.g., dp:=moe_dp), allowing for more flexible filtering of data parallel ranks. The dumper now collects more comprehensive parallel information, including MoE data parallel and attention data parallel ranks/sizes.
  • Flexible Comparator Configuration with Presets: Added support for command-line presets, allowing users to quickly apply predefined sets of comparator arguments (e.g., sglang_megatron). This simplifies common comparison setups and improves usability.
  • Refined Exit Code Logic: Updated the comparator's exit code logic to allow for patterns to explicitly permit certain skipped or failed comparisons without causing the command to exit with an error, providing more granular control over CI/CD integration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/debug_utils/comparator/init.py
    • Imported TracedAlignerPlan for new tracing functionality.
    • Replaced ComparisonRecord with TensorComparisonRecord for consistency with new output types.
  • python/sglang/srt/debug_utils/comparator/aligner/axis_aligner.py
    • Replaced dims import with dims_spec for new dimension parsing.
    • Replaced warning_sink with log_sink for centralized logging.
    • Refactored axis alignment logic with new functions _semantic_names_match, _expand_and_skip_squeeze, _build_canonical_order, and _build_side_pattern to support fused dimensions and improved pattern generation.
    • Added validation for the side argument in execute_axis_aligner_plan.
  • python/sglang/srt/debug_utils/comparator/aligner/axis_swapper.py
    • Removed file, indicating its functionality was integrated or deprecated.
  • python/sglang/srt/debug_utils/comparator/aligner/entrypoint/executor.py
    • Imported new traced_types and ShapeSnapshot for execution tracing.
    • Modified AlignerResult to include an optional traced_plan.
    • Refactored _execute_step_plans and execute_sub_plans to return StepPlansResult and SubPlansResult named tuples, incorporating execution traces.
  • python/sglang/srt/debug_utils/comparator/aligner/entrypoint/planner.py
    • Replaced dims import with dims_spec for new dimension parsing.
    • Modified compute_unsharder_plan to accept explicit_replicated_axes for enhanced parallel axis handling.
  • python/sglang/srt/debug_utils/comparator/aligner/entrypoint/traced_types.py
    • Added new file defining TracedSubPlan, TracedStepPlan, TracedSidePlan, and TracedAlignerPlan to embed execution traces into plan nodes.
  • python/sglang/srt/debug_utils/comparator/aligner/reorderer/executor.py
    • Replaced dims import with dims_spec.
  • python/sglang/srt/debug_utils/comparator/aligner/reorderer/planner.py
    • Replaced dims import with dims_spec.
  • python/sglang/srt/debug_utils/comparator/aligner/token_aligner/concat_steps/executor.py
    • Replaced dims import with dims_spec.
  • python/sglang/srt/debug_utils/comparator/aligner/token_aligner/entrypoint.py
    • Removed argparse import.
    • Replaced warning_sink and GeneralWarning with log_sink and InfoLog.
    • Modified compute_maybe_token_aligner_result to accept dir_pair and token_aligner_mode as keyword arguments.
    • Refactored _build_smart_result to use dir_pair and aux_pair.
  • python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/aux_loader.py
    • Replaced dims import with dims_spec.
    • Replaced warning_sink and GeneralWarning with log_sink, ErrorLog, and InfoLog.
    • Updated _load_and_align_aux_tensor to use sub_result.tensor and sub_result.snapshots.
  • python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/aux_plugins.py
    • Replaced dims import with dims_spec.
    • Replaced warning_sink and GeneralWarning with log_sink and InfoLog.
    • Updated infer_cp_sharded_dims to use bracket syntax t[cp:zigzag] instead of parentheses t(cp:zigzag).
  • python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/executor.py
    • Replaced dims import with dims_spec.
  • python/sglang/srt/debug_utils/comparator/aligner/token_aligner/smart/types.py
    • Replaced dims import with dims_spec.
  • python/sglang/srt/debug_utils/comparator/aligner/unsharder/executor.py
    • Replaced dims import with dims_spec.
    • Refactored _verify_replicated_group into _check_replicated_pair for clearer replicated check logic.
  • python/sglang/srt/debug_utils/comparator/aligner/unsharder/parallel_info.py
    • Replaced dims import with dims_spec.
  • python/sglang/srt/debug_utils/comparator/aligner/unsharder/planner.py
    • Replaced dims import with dims_spec.
    • Modified compute_unsharder_plan to accept explicit_replicated_axes and added _validate_explicit_replicated for robust validation.
  • python/sglang/srt/debug_utils/comparator/aligner/unsharder/types.py
    • Replaced dims import with dims_spec.
  • python/sglang/srt/debug_utils/comparator/bundle_comparator.py
    • Replaced dims import with dims_spec.
    • Replaced warning_sink and GeneralWarning with log_sink and ErrorLog.
    • Renamed ComparisonRecord to TensorComparisonRecord, SkipRecord to SkipComparisonRecord, NonTensorRecord to NonTensorComparisonRecord.
    • Added BundleFileInfo, BundleSideInfo, and _split_logs imports.
    • Introduced _collect_bundle_side_info to gather detailed bundle information.
    • Modified compare_bundle_pair to accept dir_pair instead of separate baseline/target paths.
    • Updated return types of comparison functions to use new record types.
    • Added raw_bundle_info and traced_plan to TensorComparisonRecord.
    • Adjusted DP filter logic to use _extract_dp_alias_from_items.
  • python/sglang/srt/debug_utils/comparator/dims.py
    • Removed file, its functionality was moved to the dims_spec package.
  • python/sglang/srt/debug_utils/comparator/dims_spec/init.py
    • Added new file to expose the new dimension specification parsing modules.
  • python/sglang/srt/debug_utils/comparator/dims_spec/comment_parser.py
    • Added new file for parsing comments in dimension strings to extract dp_group_alias and replicated_axes.
  • python/sglang/srt/debug_utils/comparator/dims_spec/dim_parser.py
    • Added new file for parsing individual dimension tokens, including support for fused dimensions and bracketed modifiers.
  • python/sglang/srt/debug_utils/comparator/dims_spec/dims_parser.py
    • Added new file for parsing full dimension strings, handling comments, fused dimensions, and duplicate name checks.
  • python/sglang/srt/debug_utils/comparator/dims_spec/modifier_parser.py
    • Added new file for parsing parallel modifiers within dimension specifications.
  • python/sglang/srt/debug_utils/comparator/dims_spec/tensor_naming.py
    • Added new file for utilities related to named tensors and dimension resolution.
  • python/sglang/srt/debug_utils/comparator/dims_spec/types.py
    • Added new file defining types for dimension specifications, parallel axes, modifiers, and fused dimensions.
  • python/sglang/srt/debug_utils/comparator/display.py
    • Imported rich.table for rich console output.
    • Renamed _PARALLEL_INFO_KEYS to PARALLEL_INFO_KEYS.
    • Added _render_polars_as_rich_table for rich table rendering.
    • Renamed _extract_parallel_info to extract_parallel_info.
  • python/sglang/srt/debug_utils/comparator/dp_utils.py
    • Modified filter_to_non_empty_dp_rank and _extract_dp_info to accept dp_group_alias for custom data parallel group filtering.
  • python/sglang/srt/debug_utils/comparator/entrypoint.py
    • Removed re import.
    • Replaced ComparisonRecord, NonTensorRecord, SkipRecord with new TensorComparisonRecord, NonTensorComparisonRecord, SkipComparisonRecord.
    • Added RecordLocation import.
    • Replaced report_sink import with report_sink from the new report_sink module.
    • Imported PRESETS, expand_preset, auto_descend_dir, and compute_exit_code.
    • Refactored argument parsing to use parse_args and expand_preset.
    • Updated _read_df and _maybe_load_tokenizer to use dir_pair.
    • Removed _compute_skip_keys function.
    • Updated _compare_bundle_pairs to use dir_pair and new record types.
    • Modified _consume_comparison_records to return failed_names and handle new record types.
    • Added verbosity argument to report_sink.configure and allow_failed_pattern argument.
  • python/sglang/srt/debug_utils/comparator/log_sink.py
    • Added new file defining LogSink and log_sink for centralized error and info logging.
  • python/sglang/srt/debug_utils/comparator/output_formatter.py
    • Added new file containing formatting functions for various output records, including rich console support and verbosity levels.
  • python/sglang/srt/debug_utils/comparator/output_types.py
    • Replaced GeneralWarning with BaseLog, ErrorLog, and InfoLog.
    • Renamed ComparisonRecord to TensorComparisonRecord, SkipRecord to SkipComparisonRecord, NonTensorRecord to NonTensorComparisonRecord.
    • Added BundleFileInfo, BundleSideInfo, ShapeSnapshot, and RecordLocation.
    • Removed WarningRecord.
    • Integrated formatting functions from output_formatter.py for rich and text output.
    • Added _split_logs helper function.
  • python/sglang/srt/debug_utils/comparator/per_token_visualizer.py
    • Updated ComparisonRecord to TensorComparisonRecord.
  • python/sglang/srt/debug_utils/comparator/preset.py
    • Added new file defining PRESETS and expand_preset for simplified command-line configurations.
  • python/sglang/srt/debug_utils/comparator/report_sink.py
    • Added new file defining ReportSink and report_sink for managing output to stdout and report files, with verbosity control.
  • python/sglang/srt/debug_utils/comparator/tensor_comparator/formatter.py
    • Refactored formatting logic to support rich console output and different verbosity levels.
    • Added numerous helper functions for colored output, stats formatting, and plan visualization.
  • python/sglang/srt/debug_utils/comparator/utils.py
    • Added auto_descend_dir for automatically navigating dump directories.
    • Added compute_exit_code to handle exit status based on comparison results and allowed patterns.
  • python/sglang/srt/debug_utils/comparator/warning_sink.py
    • Removed file, replaced by log_sink.
  • python/sglang/srt/debug_utils/dumper.py
    • Added moe_dp_rank, moe_dp_size, attn_cp_rank, and attn_cp_size to the collected parallel information.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the dump comparator tool. Key improvements include support for multiple verbosity levels, a more powerful and expressive dimension specification syntax (including fused dimensions and comments), and a much cleaner code organization with better separation of concerns. The move from a simple warning_sink to a more structured log_sink with error/info levels is also a great enhancement. Overall, these changes make the tool more powerful, user-friendly, and maintainable. I have one minor suggestion regarding a regression in the text output format.

"[p95] 1.5000 vs 1.5000 (diff: 0.0000)\n"
"[p99] 1.8000 vs 1.8000 (diff: 0.0000)\n"
"✅ rel_diff=0.0001\t✅ max_abs_diff=0.0005\t✅ mean_abs_diff=0.0002\n"
"✅ rel_diff=0.0001\tmax_abs_diff=0.0005\tmean_abs_diff=0.0002\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is being updated to reflect the removal of pass/fail markers (✅/❌) for max_abs_diff and mean_abs_diff in the text-only output format. This seems to be a regression, as these markers provided useful at-a-glance information. While the new rich output is the main focus, the simpler text format shouldn't lose informativeness. Please consider restoring the markers in the implementation and updating this test accordingly.

Suggested change
"✅ rel_diff=0.0001\tmax_abs_diff=0.0005\tmean_abs_diff=0.0002\n"
"✅ rel_diff=0.0001\t✅ max_abs_diff=0.0005\t✅ mean_abs_diff=0.0002\n"

@fzyzcjy fzyzcjy merged commit e5ef845 into sgl-project:main Mar 2, 2026
54 of 63 checks passed
Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant