[OMNIML2914] Support Nemotron-3-Nano PTQ, TE spec migration, and VLM quantization (Qwen3-VL) by yueshen2016 · Pull Request #1742 · NVIDIA-NeMo/Megatron-Bridge

yueshen2016 · 2025-12-16T10:48:26Z

What does this PR do?

This PR adds support for Post-Training Quantization (PTQ) and quantized checkpoint resume for large language models and vision-language models (VLMs) using Megatron-Bridge, with a shift from local spec to TE (Tensor Engine) spec for ModelOpt quantization.

Specifically:

Support PTQ and resume of quantized checkpoint for NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Change from local spec to TE spec, and deprecate local spec support for ModelOpt quantization
Support VLM model PTQ with image as calibration data, using Qwen3-VL-30B-A3B-Instruct and Qwen3-VL-8B-Instruct as examples

Changelog

Added PTQ support for NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with multi-GPU parallelism (TP/PP/EP)
Added quantized checkpoint resume and generation for Nemotron-3-Nano
Migrated quantization layer support from local spec to TE spec; deprecated local spec for ModelOpt quantization
Added VLM quantization support with image calibration data (detection-datasets/coco)
Added quantization example scripts for VLM workflows (quantize_vlm.py, ptq_generate_vlm.py) with configurable parallelism
Added comprehensive Qwen3 VL quantization end-to-end test suite with multiple parallelism configurations
Fixed multi-dimensional attention mask handling

Usage Examples

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

PTQ

torchrun --nproc_per_node 8 examples/quantization/quantize.py \
  --hf-model-id /models/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
  --calib-size 4 \
  --export-quant-cfg nvfp4 \
  --megatron-save-path /models/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4-MLM \
  --pp 4 \
  --tp 2 \
  --ep 2 \
  --trust-remote-code

Resume Quantized Checkpoint

torchrun --nproc_per_node 8 examples/quantization/ptq_generate.py \
  --megatron-load-path /models/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4-MLM \
  --hf-model-id /models/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
  --trust-remote-code \
  --tp 8 \
  --ep 8

Qwen3-VL-30B-A3B-Instruct

PTQ

torchrun --nproc_per_node=8 examples/quantization/quantize_vlm.py \
    --hf-model-id /models/Qwen3-VL-30B-A3B-Instruct \
    --export-quant-cfg fp8 \
    --megatron-save-path /models/Qwen3-VL-30B-A3B-Instruct_fp8_mlm \
    --tp 4 \
    --etp 4 \
    --pp 2 \
    --calib-size 32

Generate

torchrun --nproc_per_node=8 examples/quantization/ptq_generate_vlm.py \
    --hf-model-id /models/Qwen3-VL-30B-A3B-Instruct \
    --megatron-load-path /models/Qwen3-VL-30B-A3B-Instruct_fp8_mlm \
    --tp 8 \
    --ep 8 \
    --image-path /models/demo.jpeg \
    --prompts "Describe this image."

Qwen3-VL-8B-Instruct

PTQ

torchrun --nproc_per_node=8 examples/quantization/quantize_vlm.py \
    --hf-model-id /models/Qwen3-VL-8B-Instruct \
    --export-quant-cfg fp8 \
    --megatron-save-path /models/Qwen3-VL-8B-Instruct_fp8_mlm \
    --tp 4 \
    --pp 2 \
    --calib-size 8

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

coderabbitai · 2026-01-27T19:17:22Z

📝 Walkthrough

Walkthrough

This PR introduces quantization support for Vision-Language Models (VLMs), particularly Qwen3-VL, by adding new quantization utilities, VLM-specific quantization and generation scripts, updating the Qwen3-VL model bridge with additional parameter mappings, and adding comprehensive functional test coverage for the new workflows.

Changes

Cohort / File(s)	Summary
Quantization Utilities & Infrastructure `.github/workflows/cicd-main.yml`, `examples/quantization/quantize_utils.py`, `src/megatron/bridge/models/gpt_provider.py`	Added new test script to CI pipeline; introduced centralized quantization config utilities module with configuration choices, table creation, and CLI argument helpers; added TE-spec support detection for specific models (Qwen3-8B) with conditional layer-spec selection logic and `modelopt_use_te` flag to GPTModelProvider.
VLM Quantization Scripts `examples/quantization/quantize_vlm.py`, `examples/quantization/ptq_generate_vlm.py`	Introduced two new VLM quantization scripts: quantize_vlm.py for offline quantization with COCO/random calibration pipelines and checkpointing, and ptq_generate_vlm.py for loading and generating from quantized VLM checkpoints across multiple GPUs.
PTQ Script Updates `examples/quantization/ptq_generate.py`, `examples/quantization/quantize.py`	Refactored to support dual-path quantization validation (local-spec and TE-spec layers); quantize.py now uses centralized quantize_utils and includes dynamic layer-spec selection based on TE support.
Qwen3-VL Model & Bridge `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py`, `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/utils.py`, `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py`	Extended forward signature with inference/gathering context parameters; normalized multi-dimensional attention masks to 2D; added layernorm parameter mappings for both standard and MoE variants.
Model Loading & Training `src/megatron/bridge/training/model_load_save.py`	Enhanced checkpoint loading with modelopt state detection and debug logging to determine TE-spec usage during model restoration.
Test Infrastructure & Cases `tests/functional_tests/L2_Launch_models_qwen_vl_quantization.sh`, `tests/functional_tests/quantization/models/qwen_vl/__init__.py`, `tests/functional_tests/quantization/models/qwen_vl/test_qwen3_vl_quantization_workflow.py`	Added new test runner script for VLM quantization with coverage collection; added comprehensive test class covering quantization workflow, generation from quantized checkpoints, and parallelism validation.
Test Cleanup `tests/functional_tests/quantization/models/qwen/test_qwen3_moe_quantization_workflow.py`	Removed debug logging block that printed parameter dtype before saving.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Main as Main Entry<br/>(quantize_vlm)
    participant Bridge as AutoBridge
    participant Processor as AutoProcessor
    participant MegatronMgr as Megatron<br/>ModelProvider
    participant Quantizer as ModelOpt<br/>Quantizer
    participant SaveMgr as Checkpoint<br/>Manager

    User->>Main: Call with HF model ID,<br/>parallelism config
    Main->>Bridge: Load HF VLM
    Bridge-->>Main: Wrapped model instance
    Main->>Processor: Load text/image processor
    Processor-->>Main: Processor instance
    Main->>MegatronMgr: Configure TP/PP/EP/ETP
    Main->>MegatronMgr: Initialize Megatron model
    MegatronMgr-->>Main: Initialized model
    Main->>Main: Select calibration data<br/>(COCO or random)
    Main->>Quantizer: Run quantization with<br/>forward loop
    Quantizer->>Quantizer: Apply PTQ passes
    Quantizer-->>Main: Quantized model
    Main->>SaveMgr: Optionally compress weights
    Main->>SaveMgr: Save quantized checkpoint
    SaveMgr-->>Main: Save complete
    Main->>Main: Run test prompt/image<br/>forward pass
    Main-->>User: Generation output & stats

sequenceDiagram
    participant User as User/CLI
    participant Main as Main Entry<br/>(ptq_generate_vlm)
    participant Bridge as AutoBridge
    participant Processor as AutoProcessor
    participant MegatronMgr as Megatron<br/>ModelProvider
    participant ChkptLoader as Checkpoint<br/>Loader
    participant Validator as Quantization<br/>Validator
    participant Generator as Generation<br/>Loop

    User->>Main: Call with quantized<br/>checkpoint path
    Main->>Main: Validate paths &<br/>environment
    Main->>Bridge: Load HF model
    Bridge-->>Main: Model instance
    Main->>Processor: Load processor
    Processor-->>Main: Processor instance
    Main->>MegatronMgr: Configure parallelism
    Main->>ChkptLoader: Load quantized checkpoint
    ChkptLoader-->>Main: Loaded state dict
    Main->>Main: Apply to Megatron model
    Main->>Validator: Validate quantized layers<br/>(TE-spec layers present)
    Validator-->>Main: Validation passed/failed
    alt Validation Success
        Main->>Generator: Run generation loop<br/>with image & prompts
        Generator-->>Main: Generation outputs
        Main-->>User: Output messages & results
    else Validation Failed
        Main-->>User: Error: Missing quantized layers
    end
    Main->>Main: Cleanup distributed<br/>process group

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

yaoyu-33

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major PTQ support for VLMs but lacks test results and performance benchmarks in the description.	Update PR description with test execution results, validation outcomes, numerical correctness evidence, and performance metrics across different parallelism configurations.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 96.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The PR title comprehensively describes the main changes: PTQ support for Qwen3-VL (VLM quantization), TE spec migration, and Nemotron-3-Nano support, which align with the file summaries showing new quantization workflows, TE-spec support, and VLM examples.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yueshen/PTQ-support-Qwen3-VL

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py (1)
177-194: Add type hints and explicitly mark the new forward params as unused.

Ruff flags both parameters as unused (ARG002), and they lack type hints. If they are placeholders for API compatibility, either plumb them through or explicitly mark them unused. As per coding guidelines, please add explicit type hints for new parameters.
🛠️ Suggested fix
-        inference_context=None,
-        runtime_gather_output=None,
+        inference_context: object | None = None,
+        runtime_gather_output: bool | None = None,
     ) -> torch.Tensor:
         """Forward function of the Qwen3VL model.
@@
         Returns:
             output (torch.Tensor): Loss of shape [b, s] if labels are provided, otherwise logits of shape
                 [b, s, vocab_size].
         """
+        del inference_context, runtime_gather_output
         assert pixel_values_videos is None and video_grid_thw is None, "not support video now"

🤖 Fix all issues with AI agents

In `@examples/quantization/ptq_generate_vlm.py`:
- Around line 83-103: The file contains debug-only console.print blocks that
dump model_str and per-layer checks (using is_rank_0, model_str, te_spec_layers,
console.print); remove these debug print sections before merging or gate them
behind a CLI flag (e.g., --verbose/--debug) so the prints only run when enabled;
update any argument parsing to add the flag and wrap the existing debug blocks
with a conditional on that flag (or delete the blocks entirely) to prevent
unsolicited debug output in normal runs.
- Line 267: The CLI flag is ineffective because
parser.add_argument("--trust-remote-code", action="store_true", default=True)
always yields True; change it to a negated flag so users can disable the
default: replace that call with parser.add_argument("--no-trust-remote-code",
action="store_false", dest="trust_remote_code", default=True, help="disable
trusting remote code") and ensure any code using the trust_remote_code variable
(e.g., where the main call or model loader consumes trust_remote_code) continues
to reference the same name.

In `@examples/quantization/quantize_utils.py`:
- Around line 43-83: The function get_modelopt_torch_quantization_config mutates
the mtq_config taken from QUANT_CFG_CHOICES causing global side effects across
calls; fix this by making a deep copy of QUANT_CFG_CHOICES[export_quant_cfg]
(e.g., mtq_config = deepcopy(QUANT_CFG_CHOICES[export_quant_cfg])) before any
modifications so changes are local, and add an explicit return type hint (e.g.,
-> Dict[str, Any]) to the function signature; ensure deepcopy is imported and
update any type imports as needed.

In `@examples/quantization/quantize_vlm.py`:
- Around line 174-215: Move the restoration of per-module TopKRouter.topk to
module.config.moe_router_topk out of the calibration loop so forcing all-expert
routing covers the entire dataloader; specifically, keep the initial loop that
sets module.topk = module.num_experts (iterating model.named_modules() and
checking isinstance(module, TopKRouter)) before the dataloader loop and place
the restoration loop (setting module.topk = module.config.moe_router_topk)
immediately after the for messages in tqdm(...) loop completes (not inside it).
Also address the B007 static analysis hint by renaming the unused loop variable
name to _ in both places where you iterate model.named_modules() to avoid
unused-variable warnings.
- Around line 392-420: The save-path logic is duplicated: megatron_save_path is
already defaulted when None near the top, so remove the second conditional (the
if megatron_save_path / else block) and simply use save_path =
megatron_save_path before calling bridge.save_megatron_model; if you want a
console notice when a default was used, print it at the first assignment where
you set megatron_save_path (reference symbols: megatron_save_path, model_name,
save_path, bridge.save_megatron_model).

In `@src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/utils.py`:
- Around line 112-116: The code replaces multi-dimensional attention_mask with
all-ones, discarding padding; instead collapse extra dimensions into a 2D
[batch, seq] mask by reducing (logical OR) across the extra dims so padding
zeros are preserved: compute a 2D mask from attention_mask (e.g., reduce with
torch.any over the non-batch/non-sequence axes) and then ensure its shape
matches total_input_ids.size(1) before computing position_ids; update the branch
handling attention_mask.dim() > 2 to produce this reduced mask rather than
torch.ones_like(total_input_ids), referencing attention_mask, total_input_ids,
and position_ids in the change.

In `@src/megatron/bridge/training/model_load_save.py`:
- Around line 270-273: The debug block in build_and_load_model uses a broad
except Exception when calling _os.listdir(checkpoint_path); change this to catch
a more specific exception (e.g., FileNotFoundError or OSError) or remove the
debug code entirely; update the except clause to catch FileNotFoundError (or
OSError) and log the error message, referencing build_and_load_model,
checkpoint_path, and the _os.listdir call so the fix is applied to the correct
snippet.
- Around line 258-306: Remove the temporary DEBUG print statements introduced in
build_and_load_model and replace them with concise logger.debug calls or delete
them entirely; keep only the functional changes that set
model_cfg.restore_modelopt_state and model_cfg.modelopt_use_te when
has_modelopt_state(checkpoint_path) is true. Locate the block that calls
has_modelopt_state(checkpoint_path) and uses
_supports_modelopt_te_spec(hf_model_id) (references: has_modelopt_state,
_supports_modelopt_te_spec, model_cfg.restore_modelopt_state,
model_cfg.modelopt_use_te) and remove all plain print(...) debug lines,
optionally converting important diagnostic lines to logger.debug(...) with
minimal, non-verbose messages.

In `@tests/functional_tests/L2_Launch_models_qwen_vl_quantization.sh`:
- Around line 1-15: Move the shebang (#!/bin/bash) to the very first line of the
script so the shell can recognize it; update the file so that the current
copyright and license header follow the shebang instead of preceding it (i.e.,
place the existing header and comments after the shebang line), ensuring the
shebang remains exactly as shown and unchanged.

In
`@tests/functional_tests/quantization/models/qwen_vl/test_qwen3_vl_quantization_workflow.py`:
- Around line 326-329: Replace the test failures that use "assert False,
f'...{quantize_result.returncode}'" with pytest.fail(...) so failures still
occur under Python -O; specifically change the block checking
quantize_result.returncode (and the analogous checks at the other locations
referenced) to call pytest.fail with a clear message that includes
quantize_result.returncode and optionally stdout/stderr, and ensure pytest is
imported at the top of the test module so pytest.fail is available; update the
occurrences tied to the quantize_result checks at the locations indicated
(including the other lines mentioned) to use pytest.fail instead of assert
False.

🧹 Nitpick comments (6)

src/megatron/bridge/training/model_load_save.py (1)

259-259: Redundant import: os is already imported at module level.

Line 17 already imports os. Using import os as _os inside the function is unnecessary and confusing.

examples/quantization/ptq_generate_vlm.py (1)

270-284: Wrap main() call with try/finally for process group cleanup.

If main() raises an exception, torch.distributed.destroy_process_group() will not be called, potentially leaving dangling processes.

Suggested fix

     args = parser.parse_args()
-    main(
-        args.hf_model_id,
-        args.tp,
-        args.pp,
-        args.ep,
-        args.etp,
-        args.megatron_load_path,
-        args.prompts,
-        args.osl,
-        args.image_path,
-        args.trust_remote_code,
-    )
-
-    if torch.distributed.is_initialized():
-        torch.distributed.destroy_process_group()
+    try:
+        main(
+            args.hf_model_id,
+            args.tp,
+            args.pp,
+            args.ep,
+            args.etp,
+            args.megatron_load_path,
+            args.prompts,
+            args.osl,
+            args.image_path,
+            args.trust_remote_code,
+        )
+    finally:
+        if torch.distributed.is_initialized():
+            torch.distributed.destroy_process_group()

examples/quantization/quantize_vlm.py (4)

40-40: Use T | None instead of Optional[T] per coding guidelines.

Suggested fix

-from typing import Generator, Optional
+from typing import Generator

Then update the type hints in function signatures:

-    megatron_save_path: Optional[str] = None,
+    megatron_save_path: str | None = None,
...
-    test_image_path: Optional[str] = None,
+    test_image_path: str | None = None,

119-153: Consider adding a seed parameter for reproducibility.

The random calibration data is non-reproducible across runs. For CI/CD debugging and reproducibility, consider adding an optional seed parameter.

Suggested improvement

 def get_random_calib_dataloader(
     calib_size: int = 512,
-    image_size: tuple = (224, 224),
+    image_size: tuple[int, int] = (224, 224),
+    seed: int | None = None,
 ) -> Generator[dict, None, None]:
     ...
     import numpy as np
     from PIL import Image
 
+    if seed is not None:
+        np.random.seed(seed)
+
     for i in range(calib_size):

218-226: Consider adding type hints for model and processor parameters.

The function parameters lack type hints. While acceptable for an example script, adding hints improves IDE support and documentation.

Example

 def _custom_prompt_forward_loop_func(
-    model,
-    processor,
+    model: torch.nn.Module,
+    processor: AutoProcessor,
     is_rank_0: bool,
     prompts: str,
     osl: int = 32,
     test_image_path: str = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
 ):

462-483: Wrap main() call with try/finally for process group cleanup.

Same issue as ptq_generate_vlm.py - if main() raises, the process group won't be destroyed.

Suggested fix

     args = parser.parse_args()
-    main(
-        args.hf_model_id,
-        ...
-        args.use_random_calib,
-    )
-
-    if torch.distributed.is_initialized():
-        torch.distributed.destroy_process_group()
+    try:
+        main(
+            args.hf_model_id,
+            ...
+            args.use_random_calib,
+        )
+    finally:
+        if torch.distributed.is_initialized():
+            torch.distributed.destroy_process_group()

examples/quantization/ptq_generate_vlm.py

examples/quantization/quantize_utils.py

examples/quantization/quantize_vlm.py

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/utils.py

src/megatron/bridge/training/model_load_save.py

tests/functional_tests/L2_Launch_models_qwen_vl_quantization.sh

tests/functional_tests/quantization/models/qwen_vl/test_qwen3_vl_quantization_workflow.py

ChenhanYu

LGTM;

yaoyu-33 · 2026-01-27T23:38:21Z

src/megatron/bridge/training/model_load_save.py

            model_cfg.restore_modelopt_state = True
+        # Check if the model supports TE spec for modelopt (e.g., Qwen3-8B)
+        # If so, set modelopt_use_te=True to use TE spec instead of local spec
+        hf_model_id = getattr(model_cfg, "hf_model_id", None)


do you need to use this? we designed it only for deployment repo, dont want other parts rely on this id, because it main contain a hf local file path.

I see, I am trying to grasp the hf id for the specific model and calls this function to decide if this model is supported to run PTQ with TE spec. Once all the models are supported for PTQ with TE spec, this attribute, modelopt_use_te will be deprecated. Could you suggest another attribute that I can grasp the hf id?

yueshen2016 · 2026-02-25T02:50:55Z

/ok to test 4dfc4c7

yueshen2016 · 2026-02-25T21:34:55Z

/ok to test 4cc9adb

yueshen2016 · 2026-02-25T21:40:05Z

/ok to test adeb776

Signed-off-by: James Shen <yueshen@nvidia.com>

yueshen2016 · 2026-02-26T00:36:07Z

/ok to test 51d298f

yueshen2016 requested review from yashaswikarnati and removed request for yashaswikarnati December 16, 2025 10:48

copy-pr-bot bot had a problem deploying to nemo-ci December 16, 2025 10:48 Error

yueshen2016 force-pushed the yueshen/PTQ-support-Qwen3-VL branch from f7453ef to 3526cf1 Compare December 16, 2025 10:51

copy-pr-bot bot temporarily deployed to nemo-ci December 16, 2025 10:51 Inactive

copy-pr-bot bot temporarily deployed to test December 16, 2025 10:51 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 16, 2025 11:45 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 16, 2025 11:48 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 16, 2025 11:56 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci January 23, 2026 00:57 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 23, 2026 00:57 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 23, 2026 00:57 Inactive

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

ChenhanYu previously approved these changes Jan 27, 2026

View reviewed changes

yaoyu-33 reviewed Jan 27, 2026

View reviewed changes

ko3n1g previously approved these changes Jan 28, 2026

View reviewed changes

ko3n1g previously approved these changes Feb 25, 2026

View reviewed changes

yaoyu-33 previously approved these changes Feb 25, 2026

View reviewed changes

yueshen2016 added 4 commits February 25, 2026 16:32

Support PTQ to VLM models with image and text as calibration data

2458b6d

Signed-off-by: James Shen <yueshen@nvidia.com>

Support TE spec, Add generation function (load and generate)

a5a1531

Signed-off-by: James Shen <yueshen@nvidia.com>

Fix according to coderabbitai

ffce450

Signed-off-by: James Shen <yueshen@nvidia.com>

All ModelOpt PTQ support model accepting TE spec now

51d298f

Signed-off-by: James Shen <yueshen@nvidia.com>

chtruong814 approved these changes Feb 26, 2026

View reviewed changes

yueshen2016 mentioned this pull request Feb 26, 2026

260201: Cherrypick various changes #2509

Merged

5 tasks

Conversation

yueshen2016 commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changelog

Usage Examples

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

PTQ

Resume Quantized Checkpoint

Qwen3-VL-30B-A3B-Instruct

PTQ

Generate

Qwen3-VL-8B-Instruct

PTQ

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChenhanYu left a comment

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

yueshen2016 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

yueshen2016 commented Feb 25, 2026

Uh oh!

yueshen2016 commented Feb 25, 2026

Uh oh!

yueshen2016 commented Feb 25, 2026

Uh oh!

yueshen2016 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yueshen2016 commented Dec 16, 2025 •

edited

Loading

coderabbitai bot commented Jan 27, 2026 •

edited

Loading