Support Mixed precision & Static MSE in MCore; Nemotron Super v3 NVFP4 recipe by jenchen13 · Pull Request #1521 · NVIDIA/Model-Optimizer

jenchen13 · 2026-05-19T18:24:56Z

What does this PR do?

Type of change: New recipe + Bug Fixes

MCore and MSE fixes

support mixed precision export in MCore by detecting mixed precision layers in HF Quant Config
Restore static quantizer in MCore checkpoint restore as NVFP4QTensor (not TensorQuantizer which can call max calibrate. we want to skip max calibrate for static quantizer during restore) --> fixes bug during MCore export for MSE
Skip MSE calibration for any non-NVFP4 quantization format if fp8_scale_sweep=True
Fix dynamic block quantizer detection when block_sizes is dict-backed.
Add a YAML quantization recipe that roughly mirrors Nemotron 3 Super NVFP4 hf_quant_config.json
Export bug fixes
copy .py files properly from original HF ckpt (for reasoning parser etc)

Super recipe

Mirrors the published nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 hf_quant_config.json:

MoE routed experts (mixer.experts..{up,down}_proj): NVFP4 W4A4 weight MSE, group_size 16
MoE shared experts (mixer.shared_experts.{up,down}_proj): FP8 per-tensor
Mamba mixer linears (mixer.{in,out}_proj): FP8 per-tensor
KV cache: FP8
rest: not quantized

Usage

# Add a code snippet demonstrating how to use this

Testing

Tested on Nemotron model

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added NVFP4 (4-bit) quantization checkpoint restore and export support for Megatron-Core models
- Added tokenizer file export capability in model checkpoints
- Extended quantization support for expert-parallel distributed training
- Introduced new PTQ recipes for Nemotron-3-Super models with mixed-precision quantization
Bug Fixes
- Fixed FP8 and FP4 hardware compatibility detection on non-CUDA systems
- Improved offline Hugging Face Hub access handling with better error messaging
- Enhanced calibration validation for mixture-of-experts models
- Fixed amplitude maximum validation for static block quantizers
Documentation
- Updated expert weight quantization configuration documentation

coderabbitai · 2026-05-19T18:25:14Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds NVFP4 static-quantizer validation and restoration, per-layer quantization metadata recording in Megatron HF exports, Hugging Face Hub offline-mode support for sidecar copying, expert-parallel distributed synchronization in auto-quantize recipe selection, Nemotron PTQ recipes, and comprehensive test coverage across all new functionality.

Changes

NVFP4 validation, Megatron mixed-precision export, and MoE expert-parallel support

Layer / File(s)	Summary
NVFP4 static quantizer validation and amax validity checking `modelopt/torch/quantization/nn/modules/tensor_quantizer.py`, `modelopt/torch/export/quant_utils.py`, `modelopt/torch/quantization/qtensor/nvfp4_tensor.py`	Block-quantization predicates refactored; NVFP4 amax validity now checks for non-finite/negative/missing values with `_amax_is_invalid` helper; global_amax resolution falls back from `global_amax` to `_global_amax` when needed.
NVFP4 static quantizer promotion and state restoration `modelopt/torch/quantization/conversion.py`, `modelopt/torch/quantization/model_calib.py`, `modelopt/torch/quantization/plugins/custom.py`, `modelopt/torch/quantization/plugins/megatron.py`	New `maybe_promote_nvfp4_static_quantizer` promotes TensorQuantizer to NVFP4StaticQuantizer from saved state; custom plugin validates complete NVFP4 weight state to skip recalibration; Megatron plugin validates TP>1 incompatibility with static-block NVFP4; promotion integrated into quant_module_set_extra_state.
Per-layer quantization metadata recording for Megatron HF export `modelopt/torch/export/unified_export_megatron.py`	GPTModelExporter initializes `layer_config_dict`, calls `_record_layer_quant_config(prefix, qformat, block_size)` at all export mapping points (name remapping, gated/grouped/packed MLP, QKV slicing), delegates exclude-module registration to `_record_excluded_module`, unquantized QKV splits into per-proj excludes.
Save pretrained: gather metadata, build config, tokenizer handling `modelopt/torch/export/unified_export_megatron.py`	Distributed gather helpers `_gather_layer_config_dict()` and `_gather_kv_cache_dtype()` merge per-layer metadata and select first non-None kv-cache dtype across ranks; build HF quantization config via `process_layer_quant_config`; copy tokenizer verbatim for local dirs, else via `AutoTokenizer`; recognize `QUANTIZATION_W4A16_NVFP4` format.
HF Hub offline detection and sidecar copying `modelopt/torch/export/plugins/hf_checkpoint_utils.py`, `modelopt/torch/export/plugins/mcore_nemotron.py`	Detect offline mode via `HF_HUB_OFFLINE`, refactor remote code copying to shared `.py` helper with `allow_patterns=[".py"]` for Hub downloads; add `copy_tokenizer_from_local_ckpt` for known tokenizer files; Nemotron core_attention now passes explicit k/v scale tensor names to `SelfAttentionScaling`.
Megatron-specific MoE calibration and expert-parallel amax sync `modelopt/torch/quantization/model_calib.py`	Add MoE calibration completeness checking via recursive `SequentialQuantizer` traversal; routed-expert detection gates EP `_amax` synchronization via `sync_expert_weight_amax` predicate; MSE calibrate targets static-weight quantizers, skips non-NVFP4 formats with warning; max_calibrate docstring clarifies SequentialMLP-only behavior and EP-sync conditions.
Auto-quantize distributed consensus for MoE and expert-parallel `modelopt/torch/quantization/algorithms.py`, `modelopt/torch/quantization/plugins/megatron.py`	QuantRecipeHparam score/cost sync now include `expert_model_parallel_group`; NemotronH MCore expert regex groups `linear_fc1`/`linear_fc2` submodules; `total_weight_size` computed from candidate_stats instead of model weights; best-format consensus synchronized across DP/TP/EP; Megatron auto-quantize integration adds gradient-search predicates and decoder-layer support.
Megatron sharded state dict and Nemotron attention updates `modelopt/torch/quantization/plugins/megatron.py`	output_layer sharded state respects `untie_embeddings_and_output_weights` flag; exclude `_global_amax` from shard-axis metadata in parallel linear layers so replicated NVFP4 global amax scalars are not shardable.
Core quantization module updates `modelopt/torch/quantization/backends/utils.py`, `modelopt/torch/quantization/config.py`, `modelopt/torch/quantization/utils/calib_utils.py`	`fp8_compatible()` and `fp4_compatible()` guard against unavailable CUDA; `MaxCalibConfig.sync_expert_weight_amax` docstring clarifies SequentialMLP-only scope and TEGroupedMLP irrelevance; `update_hessian` early-returns on zero batch size.
Nemotron PTQ recipes (MSE and max-calib variants) `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`, `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`, `CHANGELOG.rst`, `tests/_test_utils/torch/quantization/quantize_common.py`, `modelopt/torch/utils/dataset_utils.py`	Add two Nemotron PTQ recipes: super-nvfp4 uses MSE with mixed-precision (routed NVFP4, shared/Mamba/KV FP8); super-nvfp4-max-calib uses max calibration for comparison. Update changelog with restore/export/recipe entries and EP amax-sync fix. Add `get_dataloader_from_dataset` helper and refactor `auto_quantize_helper` to accept optional calibration parameters.
Comprehensive NVFP4 static export testing (CPU and CUDA) `tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py`, `tests/gpu/torch/quantization/test_nvfp4_static_quantizer_cuda.py`	Add CPU tests for NVFP4 static export finiteness, overflow clamping, dequant bounds, static-vs-dynamic equivalence, manual round-trip verification, and corner cases (zero-amax, NaN-free scale bytes). Add CUDA test for FP8 block-scale overflow clamping.
Auto-quantize and Megatron-specific tests `tests/unit/torch/quantization/test_autoquant.py`, `tests/gpu_megatron/torch/quantization/plugins/test_megatron.py`, `tests/unit/torch/quantization/plugins/test_fused_experts.py`	Add unit test for auto-quantize budget using candidate-stats no-quant cost. Add distributed EP test for auto-quantize MoE recipe selection consistency. Update fused-experts test to use static-weight bootstrap variant.
Megatron export and HF checkpoint utils test updates `tests/gpu_megatron/torch/export/test_unified_export_megatron.py`, `tests/unit/torch/export/test_hf_checkpoint_utils.py`	Update export test to match new HF quantization config schema (config_groups, kv_cache_scheme.num_bits). Add QKV slicing test for unquantized fused-QKV exclude handling. Update HF utils tests for offline mode, allow_patterns signature, and LocalEntryNotFoundError handling.

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.48% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the primary objectives: mixed-precision support in MCore, static MSE calibration, and a new Nemotron Super v3 NVFP4 recipe. The title is concise, specific, and clearly communicates the main changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No critical security anti-patterns found. No torch.load weights_only=False without comment, numpy allow_pickle=True, hardcoded trust_remote_code=True, eval/exec, or nosec comments in PR files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/mcore_mse_mixed_precision

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jenchen13 · 2026-05-19T18:26:55Z

A continuation of #1363

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modelopt/torch/export/unified_export_megatron.py (1)

818-828: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Treat QUANTIZATION_NONE as unquantized when building exclude_modules.

This branch only records excludes for qformat is None, but the same method immediately returns early on qformat == QUANTIZATION_NONE, and _qkv_slicing() already treats both values the same. As written, any normal module reported as QUANTIZATION_NONE will skip the HF ignore list even though it is still unquantized.
Suggested fix
-        if qformat is None and "norm" not in prefix:
+        if qformat in (None, QUANTIZATION_NONE) and "norm" not in prefix:
             self._record_excluded_module(prefix)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/export/unified_export_megatron.py` around lines 818 - 828, The
code currently only calls _record_excluded_module(prefix) when qformat is None,
but QUANTIZATION_NONE should be treated the same; update the branch in
unified_export_megatron.py (the block around qformat, QUANTIZATION_NONE,
_get_weight_bias, and _record_excluded_module) so that if qformat is None or
qformat == QUANTIZATION_NONE (and "norm" not in prefix) you record the module as
excluded before the early return; keep the existing early return for
QUANTIZATION_NONE but ensure the exclude is recorded first and keep
compatibility with _qkv_slicing behavior.

modelopt/torch/quantization/algorithms.py (1)

765-782: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Recompute the persisted score/cost after recipe synchronization.

After best_format is replaced by the DP/TP/EP-synchronized value, best_constraints and best_scores are still accumulated from the local solver choice. On ranks that did not originate the synchronized format, self.best["constraints"] / self.best["score"] can end up describing a different recipe than the one actually activated and checkpointed.

Suggested fix

         for name, best_hparam_recipe_info in best_recipe_info.items():
             # Solvers could give different solutions for the same layer across DP/TP/EP groups even though
             # the scores and costs are the same. Lets make sure the same recipe is selected across DP/TP/EP
             _ps = self.model.get_submodule(name.split(".quant_recipe")[0]).parallel_state
             best_format = DistributedProcessGroup.get_dist_syncd_obj(
                 best_hparam_recipe_info["format"],
                 [
                     _ps.data_parallel_group,
                     _ps.tensor_parallel_group,
                     _ps.expert_model_parallel_group,
                 ],
                 lambda a: a[0],
             )

             best_recipe[name] = best_format
-            get_hparam(self.model, name).active = best_format
-            best_constraints += best_hparam_recipe_info["costs"]
-            best_scores += best_hparam_recipe_info["scores"]
+            hparam = get_hparam(self.model, name)
+            hparam.active = best_format
+            best_constraints += hparam.get_cost(best_format)
+            best_scores += hparam.get_score(best_format)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/algorithms.py` around lines 765 - 782, The loop
currently accumulates best_constraints and best_scores from
best_hparam_recipe_info before replacing the local solver's format with the
DP/TP/EP-synchronized best_format; update the code so that after you set
best_recipe[name] = best_format and get_hparam(self.model, name).active =
best_format you recompute and add the costs and scores that correspond to the
actually activated best_format (not the original
best_hparam_recipe_info["format"]); locate the mapping of format->costs/scores
that the solver produced for the layer (referencing best_recipe_info,
best_hparam_recipe_info and get_hparam) and use that entry to increment
best_constraints and best_scores (and keep
self.best["constraints"]/self.best["score"] consistent with the activated
recipe).

🧹 Nitpick comments (1)

tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py (1)
32-42: ⚡ Quick win

Add one regression that uses only the restored _global_amax path.

The implementation change specifically supports static quantizers restored with _global_amax, but this helper only seeds global_amax, so the new restore path is still untested. A single round-trip case that sets _global_amax directly would keep the actual bugfix from regressing.

As per coding guidelines, tests/**/*.py: Write focused unit tests during development and curate production tests to be lean, documenting expected behavior, protecting against regressions, and flagging backward-incompatible changes.

Also applies to: 45-70
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py` around lines
32 - 42, Add a focused unit test that exercises the restored _global_amax code
path: create an NVFP4StaticQuantizer via the existing helper
_make_static_quantizer (or directly instantiate NVFP4StaticQuantizer), set the
private attribute _global_amax (not global_amax) to a tensor value, perform the
export/import (or the same round‑trip flow used elsewhere in this test file) and
assert the quantizer restores using the _global_amax path (e.g., resulting
amax/global_amax behavior matches expected values). Ensure the test is small,
documents the expected behavior, and only validates the single round‑trip
regression scenario so the `_global_amax` restore remains covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`:
- Around line 47-79: The routed-expert weight quantizers in this max-calib
recipe (entries with quantizer_name: '*mixer.experts.*weight_quantizer' and
'*mlp.experts*weight_quantizer') are set to type: dynamic but must be static for
a fair max-vs-MSE comparison; update those two quantizer blocks to use type:
static (leave the corresponding input_quantizer blocks as-is) so only the weight
quantizers for routed experts switch from dynamic to static.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 30-32: The calibration comment is misleading about FP8 scale
selection: update the comment near the calibration block that mentions "FP8
per-tensor scales" and "NVFP4 weights" (the lines describing MSE searches) to
explicitly state that only NVFP4 weight block scales are selected via MSE while
non-NVFP4 FP8 formats skip MSE and use the stack's default scaling method; edit
the text to clarify that FP8 per-tensor scales for non-NVFP4 are not
MSE-searched to avoid confusion for recipe users.

In `@modelopt/torch/quantization/plugins/custom.py`:
- Around line 148-153: The current check treats incomplete tail blocks as
invalid; instead compute blocks per row as ceil(weight.shape[-1] / block_size)
and total expected_blocks = (weight.numel() // weight.shape[-1]) *
blocks_per_row so padded trailing blocks count toward the expected amax length.
In the validation around quantizer.block_sizes / block_size, replace
expected_blocks = weight.numel() // block_size with rows = weight.numel() //
weight.shape[-1]; blocks_per_row = math.ceil(weight.shape[-1] / block_size) (or
integer ceil via (N + block_size - 1)//block_size); expected_blocks = rows *
blocks_per_row, then return amax.numel() == expected_blocks and
global_amax.numel() == 1, allowing restored `_amax` that includes padded tail
blocks.

In `@modelopt/torch/quantization/plugins/megatron.py`:
- Around line 88-99: The TP>1 guard is too broad because it triggers for any
fake static-block quantizer; change the check that builds offending to only
consider NVFP4 static-block quantizers by requiring both
leaf.is_static_block_quant and that the leaf reports the NVFP4 format (e.g.,
leaf.format == "NVFP4" or the project’s NVFP4 enum/attribute — replace with the
actual attribute used in your quantizer objects) when iterating over leaves (the
variables/functions involved: weight_quantizer, SequentialQuantizer, leaves,
is_static_block_quant, offending, tp_group.world_size()); keep the rest of the
logic and the NotImplementedError unchanged.

In `@tests/gpu_megatron/torch/export/test_unified_export_megatron.py`:
- Around line 45-65: The test is comparing config.json's quantization_config to
the raw HF wrapper (hf_quant_config_dict) instead of the converted serving
format; change the test to use the converted structure (call
convert_hf_quant_config_format on hf_quant_config_dict or otherwise use the same
transformation used when producing config_dict) before asserting and before
indexing fields like "quant_algo", "ignore", and "config_groups"; update
references in the verification block so quant_config_dict refers to the
converted result (not the original hf_quant_config_dict) and then perform the
existing assertions and kv_cache checks against that converted object.

---

Outside diff comments:
In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 818-828: The code currently only calls
_record_excluded_module(prefix) when qformat is None, but QUANTIZATION_NONE
should be treated the same; update the branch in unified_export_megatron.py (the
block around qformat, QUANTIZATION_NONE, _get_weight_bias, and
_record_excluded_module) so that if qformat is None or qformat ==
QUANTIZATION_NONE (and "norm" not in prefix) you record the module as excluded
before the early return; keep the existing early return for QUANTIZATION_NONE
but ensure the exclude is recorded first and keep compatibility with
_qkv_slicing behavior.

In `@modelopt/torch/quantization/algorithms.py`:
- Around line 765-782: The loop currently accumulates best_constraints and
best_scores from best_hparam_recipe_info before replacing the local solver's
format with the DP/TP/EP-synchronized best_format; update the code so that after
you set best_recipe[name] = best_format and get_hparam(self.model, name).active
= best_format you recompute and add the costs and scores that correspond to the
actually activated best_format (not the original
best_hparam_recipe_info["format"]); locate the mapping of format->costs/scores
that the solver produced for the layer (referencing best_recipe_info,
best_hparam_recipe_info and get_hparam) and use that entry to increment
best_constraints and best_scores (and keep
self.best["constraints"]/self.best["score"] consistent with the activated
recipe).

---

Nitpick comments:
In `@tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py`:
- Around line 32-42: Add a focused unit test that exercises the restored
_global_amax code path: create an NVFP4StaticQuantizer via the existing helper
_make_static_quantizer (or directly instantiate NVFP4StaticQuantizer), set the
private attribute _global_amax (not global_amax) to a tensor value, perform the
export/import (or the same round‑trip flow used elsewhere in this test file) and
assert the quantizer restores using the _global_amax path (e.g., resulting
amax/global_amax behavior matches expected values). Ensure the test is small,
documents the expected behavior, and only validates the single round‑trip
regression scenario so the `_global_amax` restore remains covered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 864054fb-7e5a-459d-9bc8-f15b0be42e2b

📥 Commits

Reviewing files that changed from the base of the PR and between 8f1529a and bd2e8e9.

📒 Files selected for processing (26)

CHANGELOG.rst
examples/specdec_bench/specdec_bench/datasets/speed.py
modelopt/torch/export/plugins/hf_checkpoint_utils.py
modelopt/torch/export/plugins/mcore_nemotron.py
modelopt/torch/export/quant_utils.py
modelopt/torch/export/unified_export_megatron.py
modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/backends/utils.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/conversion.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
modelopt/torch/quantization/plugins/custom.py
modelopt/torch/quantization/plugins/megatron.py
modelopt/torch/quantization/qtensor/nvfp4_tensor.py
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml
tests/_test_utils/torch/quantization/quantize_common.py
tests/gpu/torch/quantization/test_nvfp4_static_quantizer_cuda.py
tests/gpu_megatron/torch/export/test_unified_export_megatron.py
tests/gpu_megatron/torch/quantization/plugins/test_megatron.py
tests/unit/torch/export/test_hf_checkpoint_utils.py
tests/unit/torch/quantization/plugins/test_fused_experts.py
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_mse_calibrator.py
tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py

coderabbitai · 2026-05-19T18:35:07Z

+# Calibration: weight MSE with FP8-scale sweep over the 128 e4m3 scale values
+# (NVFP4 weights use static block scales selected by MSE; FP8 per-tensor scales
+# are also chosen via MSE search instead of plain amax).


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the calibration comment for FP8 layers.

The comment says FP8 per-tensor scales are selected via MSE search, but this stack skips MSE for non-NVFP4 formats. This is misleading for recipe users.

Proposed fix

-# (NVFP4 weights use static block scales selected by MSE; FP8 per-tensor scales -# are also chosen via MSE search instead of plain amax). +# (NVFP4 routed-expert weights use static block scales selected by MSE; +# non-NVFP4 layers, such as FP8 per-tensor, follow the non-MSE path.)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml` around lines 30 - 32, The calibration comment is misleading about FP8 scale selection: update the comment near the calibration block that mentions "FP8 per-tensor scales" and "NVFP4 weights" (the lines describing MSE searches) to explicitly state that only NVFP4 weight block scales are selected via MSE while non-NVFP4 FP8 formats skip MSE and use the stack's default scaling method; edit the text to clarify that FP8 per-tensor scales for non-NVFP4 are not MSE-searched to avoid confusion for recipe users.

coderabbitai · 2026-05-19T18:35:08Z

+            block_sizes = getattr(quantizer, "block_sizes", None)
+            block_size = block_sizes.get(-1) if isinstance(block_sizes, dict) else None
+            if block_size is None or weight.shape[-1] % block_size != 0:
+                return False
+            expected_blocks = weight.numel() // block_size
+            return amax.numel() == expected_blocks and global_amax.numel() == 1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle padded trailing blocks when validating restored NVFP4 state.

Static block quantization already pads the tail block during setup, so a restored _amax can be complete even when weight.shape[-1] % block_size != 0. Returning False here forces max_calibrate() and overwrites the saved MSE-derived scales for those layers.

Suggested fix

block_sizes = getattr(quantizer, "block_sizes", None) block_size = block_sizes.get(-1) if isinstance(block_sizes, dict) else None - if block_size is None or weight.shape[-1] % block_size != 0: + if block_size is None: return False - expected_blocks = weight.numel() // block_size + rows = weight.numel() // weight.shape[-1] + expected_blocks = rows * ((weight.shape[-1] + block_size - 1) // block_size) return amax.numel() == expected_blocks and global_amax.numel() == 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/quantization/plugins/custom.py` around lines 148 - 153, The current check treats incomplete tail blocks as invalid; instead compute blocks per row as ceil(weight.shape[-1] / block_size) and total expected_blocks = (weight.numel() // weight.shape[-1]) * blocks_per_row so padded trailing blocks count toward the expected amax length. In the validation around quantizer.block_sizes / block_size, replace expected_blocks = weight.numel() // block_size with rows = weight.numel() // weight.shape[-1]; blocks_per_row = math.ceil(weight.shape[-1] / block_size) (or integer ceil via (N + block_size - 1)//block_size); expected_blocks = rows * blocks_per_row, then return amax.numel() == expected_blocks and global_amax.numel() == 1, allowing restored `_amax` that includes padded tail blocks.

github-actions · 2026-05-21T00:50:30Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-29 16:23 UTC

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 · 2026-05-27T19:47:26Z

                schema = table.schema
                if schema.metadata and b"huggingface" in schema.metadata:
-                    new_meta = {
-                        k: v


just a linter change

h-guo18

specdec changes LGTM

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

ChenhanYu · 2026-05-27T22:28:44Z

    return multimodal_state_dict
+
+
+def copy_non_safetensor_files_from_ckpt(src: str | os.PathLike, dst: str | os.PathLike):


[SUGGESTION] The current implementation copies everything non-safetensors from the source — including config.json, hf_quant_config.json, generation_config.json, preprocessor_config.json. The docstring acknowledges this and says "The caller is expected to overwrite the files modelopt owns" — and today save_pretrained does, immediately after.

The risk is the load-bearing convention. If a future refactor adds a guarded path in save_pretrained that skips writing one of those files under some condition (e.g., a new is_last_stage_main_rank sub-branch, or a try/except around _hf_config.save_pretrained), the stale source file silently survives — no test failure, no warning, just a quietly-wrong exported checkpoint.

Two safer alternatives:

Filter modelopt-owned files in the helper itself with an explicit skip-list (preferred, no caller-side discipline required):

_MODELOPT_OWNED_FILES = frozenset({ "config.json", "generation_config.json", "hf_quant_config.json", "preprocessor_config.json", }) def copy_non_safetensor_files_from_ckpt(src, dst): ... for entry in os.listdir(src): if entry in _MODELOPT_OWNED_FILES: continue if entry.endswith(".safetensors") or entry == "model.safetensors.index.json": continue ...

Or add a post-condition assert at the end of save_pretrained that the modelopt-owned files were rewritten (timestamp / contents check).

Option 1 removes the silent-failure mode entirely without changing today's behavior.

ChenhanYu · 2026-05-27T22:28:45Z

+                combined_layer_config_dict.update(layer_config_dict)
+        return dict(sorted(combined_layer_config_dict.items()))
+
+    def _gather_kv_cache_dtype(self):


[SUGGESTION] Returning the first non-None kv_cache_dtype silently picks one if ranks disagree (programmer error). For a setup bug where one attention block was configured with fp8 and another with nvfp4, the writer rank would emit whichever rank 0 happened to see first into hf_quant_config.json with no warning.

Cheap defense:

def _gather_kv_cache_dtype(self): local = getattr(self, "kv_cache_dtype", None) if not torch.distributed.is_initialized(): return local all_dtypes = [None] * torch.distributed.get_world_size() torch.distributed.all_gather_object(all_dtypes, local) seen = {dt for dt in all_dtypes if dt is not None} if len(seen) > 1: raise RuntimeError(f"Inconsistent kv_cache_dtype across ranks: {seen}") return seen.pop() if seen else None

Same applies to _gather_layer_config_dict if a key appears with conflicting values across ranks — current .update() silently picks the last one.

ChenhanYu · 2026-05-27T22:28:46Z

+                self._hf_pretrained_model_name
+            ):
+                try:
+                    tokenizer = transformers.AutoTokenizer.from_pretrained(


[SUGGESTION] Behavior change worth calling out in the PR body: the previous code unconditionally tried AutoTokenizer.from_pretrained(...).save_pretrained(save_directory) and silently swallowed errors. That had a useful side effect — if the source dir had a partial tokenizer (e.g. tokenizer.json present but tokenizer_config.json missing or stale), AutoTokenizer would often synthesize a valid tokenizer_config.json on the save side.

The new code skips this entirely for local-dir sources, trusting whatever the bulk copy produced. Mostly fine for clean source dirs, but it removes the safety net for partial/stale tokenizer files. Not a blocker — just consider keeping the AutoTokenizer call as a "second pass" overwrite even in the local-dir case, since it's idempotent on a clean source and corrective on a stale one.

ChenhanYu · 2026-05-27T22:28:47Z

Review summary — two asks

1. Split the AutoQuant changes into a separate PR

The PR title and body advertise four things: MCore mixed-precision export, static MSE NVFP4 fixes, the Nemotron-3 Super NVFP4 YAML recipe, and *.py sidecar copy. But the diff also bundles a logically independent AutoQuant: EP support + Megatron registration hooks + budget calc fix workstream:

File	What it changes
`modelopt/torch/quantization/algorithms.py` (+31/-14)	Drops "AutoQuantize does not support EP yet" warnings; adds `expert_model_parallel_group` to score / cost / best-format sync axes. Adds NemotronH MoE-in-MCore-naming regex (`linear_fc1`/`linear_fc2`). Fixes `total_weight_size` budget to come from candidate-stats (avoids a ParallelLinear weight-counting bug).
`modelopt/torch/quantization/model_quant.py` (+8)	Lazy-imports `register_megatron_autoquant_support` inside the `auto_quantize` entry.
`modelopt/torch/quantization/plugins/megatron.py` (subset of +113)	Adds `register_megatron_autoquant_support()`, `_is_supported_megatron_model`, `_megatron_grad_ckpt_context`, `_is_param_grad_enabled_for_megatron`, `get_mcore_decoder_layers`, and a `LayerActivationCollector.register_decoder_layer_support` registration for GPTQ layerwise calib.
`tests/unit/torch/quantization/test_autoquant.py` (+35)	`test_auto_quantize_budget_uses_no_quant_candidate_cost`.

These are not prerequisites for the Nemotron Super recipe — the recipe is a static hand-authored YAML, not an auto_quantize search. And the AutoQuant EP-sync change is non-trivial enough on its own that it deserves a focused review. Splitting also lets the GPTQ get_mcore_decoder_layers work (which Claude's review correctly flagged as [CRITICAL] — NoneType crash + in-place ModuleList mutation) move at its own pace without holding back the recipe + export-fix work that the PR title is actually about.

2. Export-path concerns — four inline comments below

Claude's review already covered:

[CRITICAL] get_mcore_decoder_layers correctness (agree, blocker)
_amax_is_invalid double .item() sync (agree)
untie_embeddings_and_output_weights bare except Exception (agree — narrow to ImportError)

Not in Claude's review, posting inline:

[IMPORTANT Regression] unified_export_megatron.py:309 — the new copy_non_safetensor_files_from_ckpt is gated on os.path.isdir(...), but the previous unconditional copy_hf_ckpt_remote_code(...) tail call is deleted. Hub-ID source exports lose *.py sidecar copy — directly contradicts the PR's "copy .py files properly" bullet.
[SUGGESTION] hf_checkpoint_utils.py:157 — copy_non_safetensor_files_from_ckpt doesn't filter modelopt-owned *.json files; relies on the load-bearing convention that save_pretrained overwrites them after. A future refactor that guards any of those writes silently leaks a stale source config.json. Cheap fix: skip-list in the helper itself.
[SUGGESTION] unified_export_megatron.py:1440 — _gather_kv_cache_dtype returns first non-None across ranks; if ranks disagree (programmer error), the wrong one wins silently. Assert single non-None value.
[SUGGESTION] unified_export_megatron.py:325 — tokenizer auto-repair safety net is removed for local-dir sources. Behavior change worth flagging in the PR body; not a blocker.

Algorithmic correctness of the per-layer config aggregation + _global_amax shard handling looks good (matches Claude's read). The Nemotron Super recipe + the static MSE fixes are well-targeted.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 · 2026-05-27T22:49:44Z

@ChenhanYu I removed Autoquant & GPTQ changes

cjluo-nv

Bot review — DM the bot to share feedback.

Design review: this PR extends existing infrastructure rather than introducing new abstractions — the MCore exporter now feeds its per-layer quant metadata through the existing process_layer_quant_config / layer_config_dict pipeline already used by model_config_export.py, and the recipes ride the YAML composition system from #1253. So the architectural choice is fine. However, several substantive issues remain:

Untested on the target model. PR body says "TODO test in HF and MCore PTQ on Nemotron model" — for a recipe explicitly mirroring a published Nemotron-3-Super NVFP4 config, the equivalence/round-trip check on real hardware is the headline test. Please run it before merging.
Undocumented public-API behavior change in mse_calibrate(..., fp8_scale_sweep=True). Previously, an FP8 (or any non-NVFP4) quantizer with no registered FP8-sweep backend factory fell through to a default MseCalibrator. The PR drops test_unregistered_backend_uses_default_mse_calibrator and the new control flow leaves such quantizers untouched (max-calibrated amax preserved). The new mixed-precision tests rely on this, but neither the docstring of mse_calibrate / fp8_scale_sweep, the MseCalibConfig description, nor the PR body / changelog mention that calling fp8_scale_sweep=True on a non-mixed FP8 model now silently keeps max amax instead of MSE-searching it. Either restore the fallback or document the change explicitly (changelog + docstring).
PR size. 25 files / 1468 LOC across MCore export, calibration, restore, recipes, and hf_checkpoint_utils. The four pieces are largely independent and could be split — please consider it for the next round.
Minor: redundant _record_layer_quant_config(prefix, qformat, block_size) calls inside the per-expert loops in _pack_name_remapping and _pack_name_remapping_gpt_oss (same prefix overwritten N times); super-nvfp4-max-calib.yaml header is Copyright (c) 2024 while the rest of the repo and the sibling super-nvfp4.yaml use 2026 (LICENSE_HEADER canonical is 2026); empty Usage code-fence in PR body.

cjluo-nv · 2026-05-27T23:13:04Z

+                        start_multiplier=start_multiplier,
+                        stop_multiplier=stop_multiplier,
+                        quant_func=partial(_mse_quant_func, quantizer=module),
+                    )


Bot comment.

This is a public-API behavior change that the PR doesn't document. Before this PR, calling mse_calibrate(..., fp8_scale_sweep=True) on an FP8 (or any non-NVFP4) quantizer would fall through to a default MseCalibrator (covered by the now-deleted test_unregistered_backend_uses_default_mse_calibrator). After this PR, with fp8_scale_sweep=True and no registered backend factory, a non-is_nvfp4_static quantizer takes neither branch, so _calibrator is left as the max-calibrated state and no MSE search runs.

This is what you want for the new mixed NVFP4+FP8 recipes, but it silently changes behavior for any existing caller passing fp8_scale_sweep=True on a homogeneous FP8 model. Please either:

restore the default MSE fallback for non-NVFP4 quantizers under fp8_scale_sweep=True, or

document the new semantics in the mse_calibrate docstring, the fp8_scale_sweep docstring on MseCalibConfig, and the changelog, so users aren't confused why their FP8 amax stopped moving.

cjluo-nv · 2026-05-27T23:13:04Z

            weight_scale_list.append(weight_scale)
            weight_scale_2_list.append(weight_scale_2)
            input_scale_list.append(input_scale)
+            self._record_layer_quant_config(prefix, qformat, block_size)


Bot comment.

self._record_layer_quant_config(prefix, qformat, block_size) is called inside the for expert in module: loop with the same prefix every iteration, so it just overwrites the same two layer_config_dict keys N times. Move this call out of the loop (same applies to the corresponding line in _pack_name_remapping_gpt_oss).

shengliangxu · 2026-05-27T23:59:27Z

@@ -0,0 +1,134 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


please restructure the recipe files:

modelopt_recipes/huggingface/nemotron_h/Nemotron-3-Super-120B-A12B/ptq/super-nvfp4.yaml modelopt_recipes/huggingface/nemotron_h/Nemotron-3-Super-120B-A12B/ptq/super-nvfp4-max-calib.yaml

I can't move it to a huggingface folder, these are for both HF and MCore. but I can add an NVIDIA/ folder to mimic the HuggingFace model name

it's still a huggingface model, we shall still put them there.

just that in the yaml file, for an MCore config, add an mcore tag to the name to make it explicit.

hmm these recipes work for both HF and MCore since they have patterns for both ...

Got it, it's your call to name them but putting them into huggingface is still correct, MCore is the tool we use for quantization, but the model itself is HF. We identify the recipes by the models not by the tools we use for quantization.

ok I can move to huggingface folder .. just curious do we expect to suport models outside of HF in the future?

I do want to. But the world seems converging to HF. So I am not sure if we have the chance/need of doing that lol.

shengliangxu · 2026-05-28T00:01:57Z

+          scale_bits: e4m3
+        num_bits: e2m1
+    # Megatron-Core/PTQ names: decoder.layers.*.mlp.experts.local_experts.*.linear_fc{1,2}.
+    - quantizer_name: '*mlp.experts*weight_quantizer'


So these configs use megatron path names right?

yes we need both HF and MCore names for quantization

If I understand it correctly how things work with MCore-based quantization, then this does not seem to be correct way in my opinion. But I am not yet 100% sure.

In my opinion, the quantization config shall be just configure the HF format, and our library internally shall convert the HF format based config to MCore converted config.

But let me read through this part of logic and will catch you for a discussion.

Non-blocking for this PR.

no that's not true, it does pattern matching based on the model structure and in MCore the model has different names than the HF model

We should come up with a way to unify the HF and MCore PTQ API though, I agree.

I understand the MCore models have different module paths, but if I understand it correctly, the MCore models are converted as intermediate models from HF models, which is the original input to the quant pipeline.

So for a config for the quant pipeline, or any other pipeline, it should target the original HF input model rather than the intermediate model format. Internally, we should then convert the original config to a config that makes intermediate modeling work.

Maybe my understanding is not correct. That's why I need to read through this part of logic. Will catch you up once I have a full understanding.

There are two options for MCore PTQ model loading: directly from HF model or directly from MCore model.
Oftentimes we choose the 2nd path to avoid the time it takes to convert a HF model to MCore model. For the 2nd path that's why we need names in both HF and MCore convention.

I understand what you're saying though, it's not immediately obvious to users that they have two ways to pass in model for PTQ.

Ideally we should just have recipes in HF convention and both MCore PTQ paths should work by ModelOpt storing some HF to MCore model name mapping .. that can be improved in the future when we unify HF and MCore PTQ APIs.

Right, that's exactly what I mean. The mapping should be done by modelopt internally, even if we load from a MCore checkpoint, we should still provide the original model's information to create the mapping.

ChenhanYu

Went through the export part. Great that the Autoquant related changes have been separated out.

meenchen · 2026-05-28T20:38:06Z

+        sync_expert_weight_amax: SequentialMLP only — share one weight amax across all experts
+            in a MoE layer (within-rank sync + EP all-reduce when EP>1).


Would this impact accuracy? I think for HF PTQ, experts have separate amax values

HF PTQ doesn't use this EP sync

HF PTQ doesn't use this EP sync

That's true, but I am more curious about how this will impact the accuracy. Have we run anything to measure the impact on the accuracy?

We should deprecate this argument one TE MoE supports seperate quantizers per expert.

sync_expert_weight_amax is by default False in max_calibrate. It is added for testing purposes only (e.g. to compare against TEGroupedMLP which still shares amax for all experts.

The old behavior of MCore PTQ was to always sync EP experts, which during Nano/Super PTQ experiments we realized could lead to accuracy degradation. We removed the rank local expert sync in the layer_sync_moe_local_experts_amax function, but forgot to remove the cross-EP-rank sync. This PR removes the EP sync so that all experts in an MoE have different amaxes for full correctness.

if you see _should_sync_amax_across_ep in line 256 it skips EP sync for routed experts unless you turn on sync_expert_weight_amax

@realAsma agreed, I have a PR to add separate expert quantizers in TEGroupedMLP but that needs more testing ..

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested review from a team as code owners May 19, 2026 18:24

jenchen13 requested review from h-guo18, jingyu-ml and kaix-nv May 19, 2026 18:24

jenchen13 mentioned this pull request May 19, 2026

Support Mixed precision & Static MSE in MCore; Nemotron Super v3 NVFP4 recipe #1363

Closed

jenchen13 requested review from meenchen and realAsma May 19, 2026 18:27

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

jenchen13 requested review from a team as code owners May 21, 2026 00:33

jenchen13 requested review from ajrasane and kevalmorabia97 May 21, 2026 00:33

kevalmorabia97 removed request for a team and kaix-nv May 21, 2026 11:58

lazy init autoquant register

985da85

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 commented May 27, 2026

View reviewed changes

jenchen13 mentioned this pull request May 27, 2026

Support AutoQuant in Megatron-Core #1512

Closed

h-guo18 approved these changes May 27, 2026

View reviewed changes

revert breaking change on MSECalibrator

d88e54a

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

ChenhanYu reviewed May 27, 2026

View reviewed changes

Comment thread modelopt/torch/export/unified_export_megatron.py Outdated

ChenhanYu reviewed May 27, 2026

View reviewed changes

jenchen13 added 2 commits May 27, 2026 15:38

revert autoquant and gptq changes

5f291a6

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

fallback to copy HF remote code if no dir

5c9cd43

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

cjluo-nv reviewed May 27, 2026

View reviewed changes

shengliangxu reviewed May 27, 2026

View reviewed changes

shengliangxu reviewed May 28, 2026

View reviewed changes

ChenhanYu self-requested a review May 28, 2026 17:07

ChenhanYu approved these changes May 28, 2026

View reviewed changes

Merge branch 'main' into feature/mcore_mse_mixed_precision

c5c7a2e

jenchen13 requested review from realAsma and shengliangxu May 28, 2026 20:24

meenchen reviewed May 28, 2026

View reviewed changes

meenchen approved these changes May 28, 2026

View reviewed changes

realAsma approved these changes May 28, 2026

View reviewed changes

fix if else

d63bf70

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 force-pushed the feature/mcore_mse_mixed_precision branch from 8f1d879 to d63bf70 Compare May 28, 2026 21:18

jenchen13 enabled auto-merge (squash) May 28, 2026 21:22

jenchen13 added 2 commits May 29, 2026 06:06

fix logic again

e14fa62

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

Merge branch 'main' into feature/mcore_mse_mixed_precision

e985e93

jenchen13 merged commit 4b270f0 into main May 29, 2026
57 checks passed

		return multimodal_state_dict


		def copy_non_safetensor_files_from_ckpt(src: str \| os.PathLike, dst: str \| os.PathLike):

		@@ -0,0 +1,134 @@
		# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

		sync_expert_weight_amax: SequentialMLP only — share one weight amax across all experts
		in a MoE layer (within-rank sync + EP all-reduce when EP>1).

Conversation

jenchen13 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Super recipe

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

jenchen13 commented May 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-guo18 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChenhanYu commented May 27, 2026

Review summary — two asks

1. Split the AutoQuant changes into a separate PR

2. Export-path concerns — four inline comments below

Uh oh!

jenchen13 commented May 27, 2026

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shengliangxu May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jenchen13 commented May 19, 2026 •

edited

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

shengliangxu May 27, 2026 •

edited

Loading

shengliangxu May 28, 2026 •

edited

Loading

meenchen May 28, 2026 •

edited

Loading

jenchen13 May 28, 2026 •

edited

Loading