NVIDIA · jenchen13 · May 21, 2026 · May 20, 2026
diff --git a/.agents/developer-guidelines.md b/.agents/developer-guidelines.md
diff --git a/.coderabbit.yaml b/.coderabbit.yaml
@@ -4,6 +4,8 @@ reviews:
   profile: chill
   collapse_walkthrough: true
   poem: false
+  # Allow CodeRabbit to formally approve once its comments are resolved and pre-merge checks pass
+  request_changes_workflow: true
   path_instructions:
     - path: "modelopt/**/*.py"
       instructions: &security_instructions |
@@ -25,6 +27,21 @@ reviews:
            @NVIDIA/modelopt-setup-codeowners with an explicit justification in the PR description.
     - path: "examples/**/*.py"
       instructions: *security_instructions
+    - path: "tests/**/*.py"
+      instructions: |
+        Verify tests follow the conventions in CONTRIBUTING.md. Flag the following as
+        IMPORTANT issues:
+        1. Imports inside functions or test methods without explicit justification.
+           Imports belong at the top of the file so import errors surface at collection
+           time, not mid-test. The only acceptable in-function imports are for circular
+           imports or optional dependencies (e.g., TensorRT-LLM, Megatron-Core), and
+           those should carry a brief comment naming the reason.
+        2. Redundant lower-level tests that duplicate behavior already covered by a
+           higher-level test — checked-in tests should be lean and document expected
+           behavior, protect against regressions, or flag backward-incompatible changes.
+        3. Tests placed in the wrong directory for their cost profile (e.g., multi-minute
+           tests under tests/unit, which targets a few-seconds budget; GPU-requiring
+           tests under tests/unit instead of tests/gpu*).
   auto_review:
     auto_incremental_review: true
     drafts: false

@@ -23,7 +23,7 @@ jobs:
       contains(github.event.comment.body, '/claude review') &&
       contains(fromJson('["OWNER", "MEMBER", "COLLABORATOR"]'), github.event.comment.author_association)
     runs-on: ubuntu-latest
-    timeout-minutes: 10
+    timeout-minutes: 15
     permissions:
       contents: read
       pull-requests: write
@@ -80,13 +80,18 @@ jobs:
             BASE REF: origin/${{ steps.pr-info.outputs.base_ref }}
 
             Mandatory workflow — never skip or reorder:
-            1. Read the PR diff first (gh pr diff).
-            2. Read AGENTS.md, .agents/developer-guidelines.md,
-               and CONTRIBUTING.md for project conventions, coding principles, and architecture.
-            3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
+            1. Read prior Claude activity on the PR so you don't duplicate already-raised
+               comments and can track which prior issues are now resolved:
+               `gh pr view $PR_NUMBER --repo $REPO --json comments,reviews`
+               Treat prior findings as context, not a ceiling — if you spot a genuinely new
+               issue this round, flag it.
+            2. Read the PR diff (gh pr diff).
+            3. Read AGENTS.md and CONTRIBUTING.md (including the Coding standards section)
+               for project conventions, coding principles, and architecture.
+            4. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
                `__init__.py` plus any `mode.py` / `config.py` to understand mode registration
                and config schema.
-            4. Only then perform the review using that context.
+            5. Only then perform the review using that context.
 
             You are performing a deep code review on a **NVIDIA Model Optimizer (ModelOpt)** PR.
             ModelOpt is NVIDIA's open-source library for model optimization (quantization, pruning,
@@ -118,6 +123,19 @@ jobs:
 
             ## Review Procedure
 
+            **Aim for one pass.** Surface meaningful issues in this review so the author gets
+            one consolidated set of fixes.
+
+            **Cover each changed file across categories.** For each non-trivial changed file,
+            consider the categories below (Algorithm Correctness, Mode/State, Export, Backward
+            Compatibility, Performance) before moving on.
+
+            **Trace public symbols across files.** For new or modified public symbols
+            (functions, arguments, config fields, exported names), grep call sites in
+            `modelopt/`, `tests/`, and `examples/` before commenting. Many bugs here only
+            surface where the symbol meets its caller — mode registration, export paths,
+            restore logic.
+
             1. Get PR metadata: `gh pr view $PR_NUMBER --repo $REPO --json title,body,baseRefName,headRefName,files,additions,deletions,changedFiles,author`
             2. Get the full diff: `gh pr diff $PR_NUMBER --repo $REPO`
                - For large PRs (>50 files), prioritize source code over config/lock/auto-generated files.
@@ -182,6 +200,10 @@ jobs:
             - Memory regressions: double-allocating weights, holding tensors past their lifetime.
 
             ## Suggestions (Nice to Have)
+
+            SUGGESTIONs document non-blocking improvements and never block approval (see
+            Completion below). Raise them when genuinely useful; skip nits that aren't.
+
             - Stale, imprecise, or misleading comments/docstrings — a wrong docstring is worse
               than none.
             - Missing shape/dtype assertions at module/parallelism boundaries where they would
@@ -208,5 +230,11 @@ jobs:
             - Highlight the most impactful findings
             - Overall assessment of the PR's risk level
 
-            If no significant issues are found, approve the PR:
-            gh pr review $PR_NUMBER --repo $REPO --approve --body "Claude review passed — no significant issues found. LGTM"
+            **Approval decision — use exact counts from your findings (no other thresholds):**
+
+            - `0 CRITICAL AND 0 IMPORTANT` → **approve**, regardless of SUGGESTION count.
+              SUGGESTIONs never block approval.
+              `gh pr review $PR_NUMBER --repo $REPO --approve --body "Claude review passed — no blocking issues found. LGTM"`
+            - `≥1 CRITICAL OR ≥1 IMPORTANT` → **post a comment review** summarizing the
+              issues found.
+              `gh pr review $PR_NUMBER --repo $REPO --comment --body "<summary of issues found>"`
diff --git a/AGENTS.md b/AGENTS.md
@@ -11,8 +11,9 @@ These instructions apply to AI-assisted work in this repository.
 ## Coding guidelines
 
 - **Coding guide:** Code development and review require reading and following
-  [.agents/developer-guidelines.md](.agents/developer-guidelines.md);
+  the [coding standards in CONTRIBUTING.md](CONTRIBUTING.md#-coding-standards);
   do not skip this step.
+- **Use relative paths** from the repo root in commands and file references.
 
 ## Iterative development
 

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -27,11 +27,16 @@ Changelog
 - Support Megatron-Core checkpoint restore and export for MSE ``NVFP4StaticQuantizer``.
 - Add mixed-precision FP8 + NVFP4 export for Megatron-Core: per-layer ``quant_algo`` recorded under ``quantized_layers`` in ``hf_quant_config.json``, PP-aware ``kv_cache_dtype`` gather, fused-QKV exclude split into per-HF-name ``q/k/v_proj`` entries.
 - Add Nemotron-3-Super-120B-A12B PTQ recipes ``modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`` (MSE-mixed) and ``super-nvfp4-max-calib.yaml`` (max-calib mixed): NVFP4 W4A4 routed experts + FP8 per-tensor shared experts / Mamba in/out_proj + FP8 KV cache.
+- Add NVFP4 W4A16 weight-only quantization (``w4a16_nvfp4``): FP4 weights with group_size=16, BF16 activations, no calibration forward pass required. Use ``mtq.W4A16_NVFP4_CFG`` or ``--qformat w4a16_nvfp4`` in ``hf_ptq.py``. vLLM deployment support is in progress.
+- Add ``DATASET_COMBOS`` to ``modelopt.torch.utils.dataset_utils`` — single ``--dataset`` tokens that fan out to multiple registered datasets; per-entry ``num_samples`` is split evenly across the members. Initial combos: ``cnn_nemotron_v2_mix`` (``cnn_dailymail`` + ``nemotron-post-training-dataset-v2``, used by ``hf_ptq.py`` when no ``--dataset`` is provided) and ``nemotron-post-training-v3`` (the seven ``nvidia/Nemotron-*`` SFT datasets added in #1498, mirroring the `nemotron-post-training-v3 collection <https://huggingface.co/collections/nvidia/nemotron-post-training-v3>`_). Combo names are listed by ``get_supported_datasets()`` and surfaced in ``--dataset`` help. ``get_dataset_dataloader`` rejects inputs that mix a combo with one of its member datasets (e.g. ``cnn_dailymail,cnn_nemotron_v2_mix``) to avoid double-sampling, and ``get_dataset_samples`` rejects combo names so callers route through the dataloader. ``hf_ptq.py`` default ``--calib_size`` is bumped from ``512`` to ``1024`` so the total calibration sample count under the new default combo matches the previous two-dataset fallback.
+- The ``nemotron-sft-agentic-v2`` registered dataset (added in #1498) now uses only the ``search`` split. The previously configured ``interactive_agent`` and ``tool_calling`` splits contain content-level defects (heterogeneous schema and a malformed JSON row, respectively) that cause pyarrow's streaming JSON reader to fail deterministically.
 
 **Bug Fixes**
+
 - In Megatron-Core only do EP amax sync for routed expert weights if ``sync_expert_weight_amax=True``. Previously EP amax sync would sync routed expert weights across EP ranks even when ``sync_expert_weight_amax`` was False
+- Fix Megatron-Core HF importer to load fused ``TELayerNormColumnParallelLinear.layer_norm_weight`` from HF for GPT-family models (Qwen3 etc.) under ``--export-default-te-spec``. Importer now prefers per-context keys ``fused_input_layernorm`` / ``fused_pre_mlp_layernorm`` (fallback ``fused_norm`` for Nemotron-H backward compatibility); ``mcore_qwen.py`` provides the new rules. Without this fix, post-prune MMLU sat at chance.
 
-0.44 (2026-05-18)
+0.44 (2026-05-14)
 ^^^^^^^^^^^^^^^^^
 
 **New Features**

@@ -42,6 +42,65 @@ To run the pre-commit hooks without committing, use:
 pre-commit run --all-files
 ```
 
+## 📐 Coding standards
+
+Guidelines for production code in ModelOpt. Key values: simplicity, modularity,
+and conciseness.
+
+### Principles
+
+- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative
+  refactors, broad rewrites, and "while we're here" cleanups.
+- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain.
+  Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers,
+  and treat heavy branching as a signal to reconsider the design.
+- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding.
+  Use existing extension points when they fit. If none fit, add a simple, focused helper,
+  class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases.
+- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and
+  shared behavior, not child-specific special cases.
+- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API,
+  or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync.
+- **Comment cautiously.** Comments should add context, not translate code into English.
+  Prefer making the code self-explanatory first. Use comments only for non-obvious
+  intent or constraints that remain unclear from the code. Apply this guidance to new
+  comments only; do not rewrite or delete existing comments just for style.
+- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful.
+  Internal helpers should usually be self-documenting through clear names and structure.
+- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect.
+- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those
+  checks and avoid redundant assertions.
+- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers.
+- **Keep imports at the top of the file.** Place all imports at module top in both source
+  and test files so import errors surface at module load time rather than at runtime or
+  during a specific test. Put an import inside a function only when there is a concrete
+  reason: resolving a circular import that cannot be restructured, guarding an optional
+  dependency (e.g., TensorRT-LLM, Megatron-Core), or deferring an unusually heavy import
+  with explicit justification. Add a brief comment in those cases naming the reason.
+- **Define the public API with `__all__` and re-export via `from .module import *`.**
+  Each module declares its public surface with `__all__ = [...]` at the top of the file.
+  Package `__init__.py` files re-export submodules with `from .module import *`. This
+  keeps the public API explicit at the source (next to the definitions), avoids
+  hand-maintained import lists in `__init__.py` drifting out of sync, and makes
+  star-imports safe by limiting them to the curated `__all__` names.
+
+### Performant AI code
+
+- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine.
+  Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they
+  can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars
+  when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs.
+- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0`
+  when possible to avoid noisy logs. Guard shared side effects, such as
+  file writes or shared state updates, against race conditions between ranks.
+
+### Compatibility
+
+- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized
+  `ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change
+  without backward compatibility handling, older checkpoints may no longer load. Make breaking changes
+  explicit and intentional.
+
 ## Adding a new PIP dependency
 
 Currently we have 2 places where we mention pip dependencies: [pyproject.toml](./pyproject.toml) for dependencies that are required for the ModelOpt library and `examples/<example-name>/requirements.txt` for dependencies that are required for the specific examples.
@@ -101,6 +160,17 @@ For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py)
 nox -s "unit-3.12(torch_211, tf_latest)"
 ```
 
+### Test design principles
+
+- **Develop with focused tests.** During development, write as many focused
+  tests as needed, including lower-level unit tests or internal probes, to
+  understand and harden behavior.
+- **Curate production tests and keep them lean.** Before staging or committing,
+  decide which tests should be checked in. Checked-in tests should document
+  expected behavior, protect against regressions, or flag backward-incompatible
+  behavior changes. Remove redundant lower-level tests when a higher-level test
+  already covers the same behavior, keeping CI/CD fast and lean.
+
 ## ✍️ Signing your work
 
 - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original

@@ -86,7 +86,6 @@ You can also directly use NVIDIA container images, which have Model Optimizer pr
 - `nvcr.io/nvidia/pytorch:<version>-py3`
 - `nvcr.io/nvidia/nemo:<version>`
 - `nvcr.io/nvidia/tensorrt-llm/release:<version>`
-- `nvcr.io/nvidia/tensorrt:<version>-py3`
 
 Before pulling and using the container images, please review their respective license terms.
 Make sure to upgrade Model Optimizer to the latest version as described above.