Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 0 additions & 58 deletions .agents/developer-guidelines.md

This file was deleted.

17 changes: 17 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ reviews:
profile: chill
collapse_walkthrough: true
poem: false
# Allow CodeRabbit to formally approve once its comments are resolved and pre-merge checks pass
request_changes_workflow: true
path_instructions:
- path: "modelopt/**/*.py"
instructions: &security_instructions |
Expand All @@ -25,6 +27,21 @@ reviews:
@NVIDIA/modelopt-setup-codeowners with an explicit justification in the PR description.
- path: "examples/**/*.py"
instructions: *security_instructions
- path: "tests/**/*.py"
instructions: |
Verify tests follow the conventions in CONTRIBUTING.md. Flag the following as
IMPORTANT issues:
1. Imports inside functions or test methods without explicit justification.
Imports belong at the top of the file so import errors surface at collection
time, not mid-test. The only acceptable in-function imports are for circular
imports or optional dependencies (e.g., TensorRT-LLM, Megatron-Core), and
those should carry a brief comment naming the reason.
2. Redundant lower-level tests that duplicate behavior already covered by a
higher-level test — checked-in tests should be lean and document expected
behavior, protect against regressions, or flag backward-incompatible changes.
3. Tests placed in the wrong directory for their cost profile (e.g., multi-minute
tests under tests/unit, which targets a few-seconds budget; GPU-requiring
tests under tests/unit instead of tests/gpu*).
auto_review:
auto_incremental_review: true
drafts: false
Expand Down
44 changes: 36 additions & 8 deletions .github/workflows/claude_review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
contains(github.event.comment.body, '/claude review') &&
contains(fromJson('["OWNER", "MEMBER", "COLLABORATOR"]'), github.event.comment.author_association)
runs-on: ubuntu-latest
timeout-minutes: 10
timeout-minutes: 15
permissions:
contents: read
pull-requests: write
Expand Down Expand Up @@ -80,13 +80,18 @@ jobs:
BASE REF: origin/${{ steps.pr-info.outputs.base_ref }}

Mandatory workflow — never skip or reorder:
1. Read the PR diff first (gh pr diff).
2. Read AGENTS.md, .agents/developer-guidelines.md,
and CONTRIBUTING.md for project conventions, coding principles, and architecture.
3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
1. Read prior Claude activity on the PR so you don't duplicate already-raised
comments and can track which prior issues are now resolved:
`gh pr view $PR_NUMBER --repo $REPO --json comments,reviews`
Treat prior findings as context, not a ceiling — if you spot a genuinely new
issue this round, flag it.
2. Read the PR diff (gh pr diff).
3. Read AGENTS.md and CONTRIBUTING.md (including the Coding standards section)
for project conventions, coding principles, and architecture.
4. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
`__init__.py` plus any `mode.py` / `config.py` to understand mode registration
and config schema.
4. Only then perform the review using that context.
5. Only then perform the review using that context.

You are performing a deep code review on a **NVIDIA Model Optimizer (ModelOpt)** PR.
ModelOpt is NVIDIA's open-source library for model optimization (quantization, pruning,
Expand Down Expand Up @@ -118,6 +123,19 @@ jobs:

## Review Procedure

**Aim for one pass.** Surface meaningful issues in this review so the author gets
one consolidated set of fixes.

**Cover each changed file across categories.** For each non-trivial changed file,
consider the categories below (Algorithm Correctness, Mode/State, Export, Backward
Compatibility, Performance) before moving on.

**Trace public symbols across files.** For new or modified public symbols
(functions, arguments, config fields, exported names), grep call sites in
`modelopt/`, `tests/`, and `examples/` before commenting. Many bugs here only
surface where the symbol meets its caller — mode registration, export paths,
restore logic.

1. Get PR metadata: `gh pr view $PR_NUMBER --repo $REPO --json title,body,baseRefName,headRefName,files,additions,deletions,changedFiles,author`
2. Get the full diff: `gh pr diff $PR_NUMBER --repo $REPO`
- For large PRs (>50 files), prioritize source code over config/lock/auto-generated files.
Expand Down Expand Up @@ -182,6 +200,10 @@ jobs:
- Memory regressions: double-allocating weights, holding tensors past their lifetime.

## Suggestions (Nice to Have)

SUGGESTIONs document non-blocking improvements and never block approval (see
Completion below). Raise them when genuinely useful; skip nits that aren't.

- Stale, imprecise, or misleading comments/docstrings — a wrong docstring is worse
than none.
- Missing shape/dtype assertions at module/parallelism boundaries where they would
Expand All @@ -208,5 +230,11 @@ jobs:
- Highlight the most impactful findings
- Overall assessment of the PR's risk level

If no significant issues are found, approve the PR:
gh pr review $PR_NUMBER --repo $REPO --approve --body "Claude review passed — no significant issues found. LGTM"
**Approval decision — use exact counts from your findings (no other thresholds):**

- `0 CRITICAL AND 0 IMPORTANT` → **approve**, regardless of SUGGESTION count.
SUGGESTIONs never block approval.
`gh pr review $PR_NUMBER --repo $REPO --approve --body "Claude review passed — no blocking issues found. LGTM"`
- `≥1 CRITICAL OR ≥1 IMPORTANT` → **post a comment review** summarizing the
issues found.
`gh pr review $PR_NUMBER --repo $REPO --comment --body "<summary of issues found>"`
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ These instructions apply to AI-assisted work in this repository.
## Coding guidelines

- **Coding guide:** Code development and review require reading and following
[.agents/developer-guidelines.md](.agents/developer-guidelines.md);
the [coding standards in CONTRIBUTING.md](CONTRIBUTING.md#-coding-standards);
do not skip this step.
- **Use relative paths** from the repo root in commands and file references.

## Iterative development

Expand Down
7 changes: 6 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,16 @@ Changelog
- Support Megatron-Core checkpoint restore and export for MSE ``NVFP4StaticQuantizer``.
- Add mixed-precision FP8 + NVFP4 export for Megatron-Core: per-layer ``quant_algo`` recorded under ``quantized_layers`` in ``hf_quant_config.json``, PP-aware ``kv_cache_dtype`` gather, fused-QKV exclude split into per-HF-name ``q/k/v_proj`` entries.
- Add Nemotron-3-Super-120B-A12B PTQ recipes ``modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`` (MSE-mixed) and ``super-nvfp4-max-calib.yaml`` (max-calib mixed): NVFP4 W4A4 routed experts + FP8 per-tensor shared experts / Mamba in/out_proj + FP8 KV cache.
- Add NVFP4 W4A16 weight-only quantization (``w4a16_nvfp4``): FP4 weights with group_size=16, BF16 activations, no calibration forward pass required. Use ``mtq.W4A16_NVFP4_CFG`` or ``--qformat w4a16_nvfp4`` in ``hf_ptq.py``. vLLM deployment support is in progress.
- Add ``DATASET_COMBOS`` to ``modelopt.torch.utils.dataset_utils`` — single ``--dataset`` tokens that fan out to multiple registered datasets; per-entry ``num_samples`` is split evenly across the members. Initial combos: ``cnn_nemotron_v2_mix`` (``cnn_dailymail`` + ``nemotron-post-training-dataset-v2``, used by ``hf_ptq.py`` when no ``--dataset`` is provided) and ``nemotron-post-training-v3`` (the seven ``nvidia/Nemotron-*`` SFT datasets added in #1498, mirroring the `nemotron-post-training-v3 collection <https://huggingface.co/collections/nvidia/nemotron-post-training-v3>`_). Combo names are listed by ``get_supported_datasets()`` and surfaced in ``--dataset`` help. ``get_dataset_dataloader`` rejects inputs that mix a combo with one of its member datasets (e.g. ``cnn_dailymail,cnn_nemotron_v2_mix``) to avoid double-sampling, and ``get_dataset_samples`` rejects combo names so callers route through the dataloader. ``hf_ptq.py`` default ``--calib_size`` is bumped from ``512`` to ``1024`` so the total calibration sample count under the new default combo matches the previous two-dataset fallback.
- The ``nemotron-sft-agentic-v2`` registered dataset (added in #1498) now uses only the ``search`` split. The previously configured ``interactive_agent`` and ``tool_calling`` splits contain content-level defects (heterogeneous schema and a malformed JSON row, respectively) that cause pyarrow's streaming JSON reader to fail deterministically.

**Bug Fixes**

- In Megatron-Core only do EP amax sync for routed expert weights if ``sync_expert_weight_amax=True``. Previously EP amax sync would sync routed expert weights across EP ranks even when ``sync_expert_weight_amax`` was False
- Fix Megatron-Core HF importer to load fused ``TELayerNormColumnParallelLinear.layer_norm_weight`` from HF for GPT-family models (Qwen3 etc.) under ``--export-default-te-spec``. Importer now prefers per-context keys ``fused_input_layernorm`` / ``fused_pre_mlp_layernorm`` (fallback ``fused_norm`` for Nemotron-H backward compatibility); ``mcore_qwen.py`` provides the new rules. Without this fix, post-prune MMLU sat at chance.

0.44 (2026-05-18)
0.44 (2026-05-14)
^^^^^^^^^^^^^^^^^

**New Features**
Expand Down
70 changes: 70 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,65 @@ To run the pre-commit hooks without committing, use:
pre-commit run --all-files
```

## 📐 Coding standards

Guidelines for production code in ModelOpt. Key values: simplicity, modularity,
and conciseness.

### Principles

- **Prefer simple, surgical changes.** Touch only what the task requires. Avoid speculative
refactors, broad rewrites, and "while we're here" cleanups.
- **Design for simplicity and readability.** Choose the design that is easiest to understand and maintain.
Code is read top to bottom: put high-level behavior first, hide lower-level details behind well-named helpers,
and treat heavy branching as a signal to reconsider the design.
- **Prefer modular, composable solutions.** Avoid input-specific or case-specific hard-coding.
Use existing extension points when they fit. If none fit, add a simple, focused helper,
class, or plugin that cleanly captures the new behavior. Keep scope limited to known cases.
- **Respect inheritance boundaries.** Parent abstractions should define shared contracts and
shared behavior, not child-specific special cases.
- **Don't repeat yourself; keep a single source of truth.** Consolidate repeated logic or intent with a shared helper, API,
or abstraction when doing so keeps the design simpler. Avoid duplication that can drift out of sync.
- **Comment cautiously.** Comments should add context, not translate code into English.
Prefer making the code self-explanatory first. Use comments only for non-obvious
intent or constraints that remain unclear from the code. Apply this guidance to new
comments only; do not rewrite or delete existing comments just for style.
- **Document public APIs.** Public and higher-level APIs should have docstrings, including examples when useful.
Internal helpers should usually be self-documenting through clear names and structure.
- **Fix the bug cause, not the side effect.** For bug fixes, find the root cause instead of patching for its side effect.
- **Validate external input once.** Check types and values at the interface boundary. Internal code can trust those
checks and avoid redundant assertions.
- **Remove dead code.** Delete unused imports, unreachable branches, and obsolete helpers.
- **Keep imports at the top of the file.** Place all imports at module top in both source
and test files so import errors surface at module load time rather than at runtime or
during a specific test. Put an import inside a function only when there is a concrete
reason: resolving a circular import that cannot be restructured, guarding an optional
dependency (e.g., TensorRT-LLM, Megatron-Core), or deferring an unusually heavy import
with explicit justification. Add a brief comment in those cases naming the reason.
- **Define the public API with `__all__` and re-export via `from .module import *`.**
Each module declares its public surface with `__all__ = [...]` at the top of the file.
Package `__init__.py` files re-export submodules with `from .module import *`. This
keeps the public API explicit at the source (next to the definitions), avoids
hand-maintained import lists in `__init__.py` drifting out of sync, and makes
star-imports safe by limiting them to the curated `__all__` names.

### Performant AI code

- **Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs.** Reading metadata such as `tensor.shape` is fine.
Avoid Python scalar extraction and operators such as `tensor.item()`, `float(tensor)`, or `min(tensor)` because they
can trigger CPU-GPU syncs. Use PyTorch tensor ops such as `tensor.min()` by default, and only extract Python scalars
when the CPU needs the value. Tensor-value-based Python branching can also break CUDA graphs.
- **Develop with distributed processing in mind.** Examples: Use `print_rank_0` or `warn_rank_0`
when possible to avoid noisy logs. Guard shared side effects, such as
file writes or shared state updates, against race conditions between ranks.

### Compatibility

- **Preserve config and checkpoint backward compatibility.** ModelOpt checkpoints include serialized
`ModeloptBaseConfig` instances such as `QuantizeConfig`. If these Pydantic-based configs change
without backward compatibility handling, older checkpoints may no longer load. Make breaking changes
explicit and intentional.

## Adding a new PIP dependency

Currently we have 2 places where we mention pip dependencies: [pyproject.toml](./pyproject.toml) for dependencies that are required for the ModelOpt library and `examples/<example-name>/requirements.txt` for dependencies that are required for the specific examples.
Expand Down Expand Up @@ -101,6 +160,17 @@ For broader repo validation and dependency setup, use [noxfile.py](./noxfile.py)
nox -s "unit-3.12(torch_211, tf_latest)"
```

### Test design principles

- **Develop with focused tests.** During development, write as many focused
tests as needed, including lower-level unit tests or internal probes, to
understand and harden behavior.
- **Curate production tests and keep them lean.** Before staging or committing,
decide which tests should be checked in. Checked-in tests should document
expected behavior, protect against regressions, or flag backward-incompatible
behavior changes. Remove redundant lower-level tests when a higher-level test
already covers the same behavior, keeping CI/CD fast and lean.

## ✍️ Signing your work

- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ You can also directly use NVIDIA container images, which have Model Optimizer pr
- `nvcr.io/nvidia/pytorch:<version>-py3`
- `nvcr.io/nvidia/nemo:<version>`
- `nvcr.io/nvidia/tensorrt-llm/release:<version>`
- `nvcr.io/nvidia/tensorrt:<version>-py3`

Before pulling and using the container images, please review their respective license terms.
Make sure to upgrade Model Optimizer to the latest version as described above.
Expand Down
Loading
Loading