Support presets and arbitrary skipping keys in dump comparator#19676
Support presets and arbitrary skipping keys in dump comparator#19676fzyzcjy merged 461 commits intosgl-project:mainfrom
Conversation
Follow existing pattern: viz output is silent on success. The user already knows the output path since they specified it via CLI.
Tensor already has named dims applied by _apply_dim_names_from_meta before _resolve_seq_dim is called, so the metadata fallback path was redundant.
- calc_per_token_rel_diff tests → test_utils.py (matches utils.py) - _compute_diff seq_dim tests → test_comparator.py (matches comparator.py) - delete standalone test_per_token_diff.py - split TestManuallyVerify into TestBundleDetailsManualVerify and TestPerTokenHeatmapManualVerify - remove duplicate test_visualize_per_token_with_dims from test_entrypoint.py
New params type for element-wise summation of partial tensors (reduction=partial). No dim_name needed since output shape == input shape.
Replace NotImplementedError with ReduceSumParams() when a dim spec has reduction=partial. The grouping logic is unchanged — sum is commutative so rank order within a group doesn't matter.
Strip named dims, stack, sum along dim=0, then restore names. Same pattern as _thd_concat for name handling.
Replace test_reduction_not_implemented_raises with tests verifying ReduceSumParams is returned for partial reduction dims: basic TP=2, TP=4, mixed CP+TP, and scrambled rank order.
Tests: basic TP=2 reduce, TP=4 reduce, multi-axis concat+reduce, scrambled rank order, and named dimension preservation.
Three scenarios: both-sides TP partial, single-rank vs TP partial, and mixed CP concat + TP partial reduction.
Verify that pipeline parallelism works correctly with the existing bundle matching logic: different world ranks with same layer_id match, non-layer tensors match across PP ranks, different PP sizes form correct bundles, and unmatched layer_ids don't create false matches.
After match_bundles, emit a GeneralWarning listing tensors that exist in target but have no matching baseline. Useful for diagnosing cases where some PP ranks didn't dump correctly.
When loading auxiliary tensors (input_ids, positions, etc.), check embedded meta for pp_rank. If items span multiple PP ranks, keep only pp_rank=0 and emit a warning. This guards against the unlikely case where non-first PP stages accidentally dump aux tensors.
This reverts commit 8678d6f.
…int" This reverts commit 85a76e4.
Replace 5 redundant PP tests with one that covers the actual net-new behavior: nullable layer_id column participates in grouping alongside rows without layer_id.
This reverts commit c0119d3.
Cover: unclosed quotes, empty quotes, equals inside multi-token quotes, consecutive quoted values, user real-world e2e scenario (shell not expanding quotes), and bare token without equals.
Flatten the main loop with continue-first pattern and introduce _QuoteParseResult NamedTuple to make the two-state machine (normal vs collecting) explicit and easier to follow.
… helper" This reverts commit e82487a.
This reverts commit 14ea40f.
…ns with spaces" This reverts commit b38f787.
…ns with spaces" This reverts commit d7dd57c. # Conflicts: # test/registered/debug_utils/test_dumper.py
…ns with spaces" This reverts commit d7dd57c. # Conflicts: # test/registered/debug_utils/test_dumper.py
extra_imports (e.g. dumper import) must execute before user preamble code so that preamble can reference the imported names.
- Add dp_rank field to PositionalSeqId (default=0 for backward compat) - Pass dp_rank through aux_plugins compute_step_aux to distinguish sequences from different DP ranks - Make aux_loader DP-aware: detect dp_rank from metadata, group rows by dp_rank, independently unshard each group, then concatenate step_auxs across DP groups
# Conflicts: # python/sglang/srt/debug_utils/comparator/aligner/token_aligner/executor.py
- Add missing `warning_sink` import in unsharder test_executor.py - Remove unused `parse_dim_names` import in test_dims.py - Add CI registry to source_patcher test files - Fix axis_swapper.py to use parse_dims().dims instead of parse_dims()
# Conflicts: # test/registered/debug_utils/comparator/aligner/unsharder/test_executor.py
# Conflicts: # python/sglang/srt/debug_utils/comparator/bundle_comparator.py # python/sglang/srt/debug_utils/comparator/dims.py # python/sglang/srt/debug_utils/comparator/entrypoint.py # test/registered/debug_utils/comparator/aligner/reorderer/test_planner.py # test/registered/debug_utils/comparator/aligner/unsharder/test_executor.py # test/registered/debug_utils/comparator/test_dims.py # test/registered/debug_utils/comparator/test_entrypoint.py # test/registered/debug_utils/comparator/test_model_validation.py # test/registered/debug_utils/source_patcher/test_code_patcher.py # test/registered/debug_utils/source_patcher/test_dumper_integration.py # test/registered/debug_utils/source_patcher/test_source_editor.py
# Conflicts: # test/registered/debug_utils/comparator/test_entrypoint.py
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the dump comparator's flexibility and usability by introducing CLI presets and arbitrary metadata key skipping. It refactors dimension parsing to better support complex parallelization schemes, particularly for data parallel group aliases, and improves the clarity of comparison output records. These changes streamline the process of debugging and comparing tensor dumps from distributed training environments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces presets and arbitrary skipping keys to the dump comparator, which is a significant improvement for usability. The refactoring of argument parsing and the introduction of DimsSpec and dp_group_alias are well-executed. The code changes are extensive but seem correct and are accompanied by corresponding test updates.
I found two minor issues related to code duplication, one in the application code and one in the tests, which should be addressed. Overall, this is a solid contribution that enhances the debugging capabilities.
| def _maybe_load_tokenizer(args: argparse.Namespace) -> Any: | ||
| tokenizer_path: Optional[str] = getattr(args, "tokenizer", None) | ||
|
|
||
| if tokenizer_path is None: | ||
| for directory in [Path(args.baseline_path), Path(args.target_path)]: | ||
| tokenizer_path = read_tokenizer_path(directory) | ||
| if tokenizer_path is not None: | ||
| break | ||
|
|
||
| if tokenizer_path is None: | ||
| return None | ||
|
|
||
| try: | ||
| from transformers import AutoTokenizer | ||
|
|
||
| return AutoTokenizer.from_pretrained(tokenizer_path) | ||
| except Exception: | ||
| return None | ||
|
|
There was a problem hiding this comment.
| def test_squeeze_dim(self) -> None: | ||
| assert parse_dim("1") == DimSpec(name="1") | ||
|
|
||
| def test_squeeze_dim_rejects_modifiers(self) -> None: | ||
| with pytest.raises(ValueError, match="Invalid dim token"): | ||
| parse_dim("1(tp)") |
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci