Skip to content

Ssameni/puzzletron bypass 3 integration#1470

Open
Separius wants to merge 9 commits into
mainfrom
ssameni/puzzletron-bypass-3-integration
Open

Ssameni/puzzletron bypass 3 integration#1470
Separius wants to merge 9 commits into
mainfrom
ssameni/puzzletron-bypass-3-integration

Conversation

@Separius

@Separius Separius commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

This is PR 3 of 3 in the Puzzletron bypass/local-distillation stack.

This PR wires the bypass distillation core into the full Puzzletron pipeline and adds runnable configs, docs, and end-to-end GPU
coverage.

Stack:

  1. ssameni/puzzletron-bypass-1-prereqs: shared prerequisites
  2. ssameni/puzzletron-bypass-2-core: bypass distillation core
  3. This PR: Puzzletron integration, configs, docs, GPU coverage

What Changed

  • Added bypass as an optional Puzzletron pipeline stage after pruning and before replacement-library construction.
  • Added dynamic Puzzletron progress numbering so runs report the correct total with or without bypass.
  • Added config-specific bypass skip-if-complete detection.
  • Added replacement-library stale detection so newly realized bypass checkpoints are picked up.
  • Updated replacement-library extraction to infer which subblocks to extract from bypass checkpoint metadata.
  • Prioritized bypass-trained checkpoints over plain pruned checkpoints when duplicate architectural candidates exist.
  • Added MIP support for target_num_kv_heads.
  • Added Llama bypass config coverage.
  • Added Nemotron-3 Nano configs for pruning, validation, bypass, and full Puzzletron runs.
  • Added Nemotron-3 example documentation.
  • Added GPU/integration coverage for bypass training, checkpointing, resume, and full Puzzletron behavior.

Why

The previous PR adds the bypass training engine, but Puzzletron still needs pipeline wiring so bypass-trained blocks become usable
replacement-library candidates.

This PR keeps that integration separate from the core engine so reviewers can focus on pipeline behavior, config surface,
replacement-library semantics, and end-to-end coverage.

Tests

Added/updated coverage for:

  • Puzzletron progress numbering
  • bypass replacement-library discovery/extraction/priority
  • bypass GPU smoke/integration behavior
  • bypass checkpoint GPU behavior
  • bypass resume behavior
  • full Puzzletron pipeline behavior with bypass

Summary by CodeRabbit

  • New Features
    • Bypass distillation workflows, KV-head compression, and block-pruning runs for Nemotron-3-Nano and related models.
  • Documentation
    • New end-to-end Nemotron bypass tutorial and expanded Puzzletron examples and config references with run commands and results.
  • Bug Fixes / UX
    • Dynamic pipeline progress reporting, smarter stage-skip/resume logic, automatic remote model retrieval, and forced rebuilds when checkpoints change.
  • Tests
    • Extensive GPU and unit tests covering bypass runs, resume/checkpoint utilities, subblock modes, and replacement-library behavior.

@copy-pr-bot

copy-pr-bot Bot commented May 12, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a8c9e18-bfea-4fab-a72a-80f3b6b21492

📥 Commits

Reviewing files that changed from the base of the PR and between 7b128ec and fbfa166.

📒 Files selected for processing (1)
  • tests/gpu/torch/puzzletron/test_bypass.py

📝 Walkthrough

Walkthrough

Adds end-to-end bypass distillation orchestration to Puzzletron: new Nemotron and Llama bypass configs and tutorial, config-driven pipeline progress, AnyModel checkpoint completeness checks, bypass-run detection, replacement-library metadata parsing with bypass prioritization, and broad unit + GPU tests covering bypass scenarios.

Changes

Bypass Distillation Integration

Layer / File(s) Summary
Bypass Configuration Suite
examples/puzzletron/Nemotron-3-Nano-30B-A3B-Base-BF16.md, examples/puzzletron/README.md, examples/puzzletron/configs/nemotron-3-nano-30b-a3b/*.yaml, examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/bypass/defaults.yaml, examples/puzzletron/configs/nemotron-3-nano-30b-a3b/bypass/defaults.yaml, examples/puzzletron/configs/nemotron-3-nano-30b-a3b/validate_*.yaml
Tutorial and bypass/run defaults: Nemotron base config, KV-heads pruning targets, attention-only bypass defaults with keys_to_learn and multi-config sweeps, and validation defaults.
Nemotron Pruning Configuration
examples/puzzletron/configs/nemotron-3-nano-30b-a3b/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16.yaml, examples/puzzletron/configs/nemotron-3-nano-30b-a3b/pruning/kv_heads_pruning.yaml, examples/puzzletron/configs/nemotron-3-nano-30b-a3b/pruning/pruning_defaults.yaml
MIP/scoring/realization and KV-head pruning activation-hook settings with pruning/init modes.
Progress Tracking & MIP Constraint
examples/puzzletron/main.py, modelopt/torch/puzzletron/mip/run_puzzle.py
Replaces fixed progress strings with _progress_step()-derived counters; adds target_num_kv_heads human constraint mapped to stats.num_kv_heads.
Orchestration Pipeline Refactoring
modelopt/torch/puzzletron/puzzletron_nas_plugin.py, modelopt/torch/puzzletron/bypass_distillation/bypass_utils.py
Derives stage ordering from Hydra config; adds checkpoint completeness/readability checks, HF snapshot auto-download, bypass-run completeness detection, scoring-cache invalidation, and replacement-library staleness detection with rebuild coordination.
Replacement Library Metadata & Priority
modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
Parses bypass metadata from args.json (nested model_factory.keys_to_learn) or bypass_config.json; deterministically prioritizes bypass-trained checkpoints over pruned duplicates.
Unit Tests for Orchestration & Progress
tests/unit/torch/puzzletron/test_puzzletron_nas_plugin.py, tests/unit/torch/puzzletron/test_puzzletron_progress.py, tests/unit/torch/puzzletron/test_bypass_utils.py
Tests checkpoint completeness, scoring-cache invalidation, revalidation toggling, progress-step computation with/without bypass, and canonical teacher-path handling in bypass fingerprint/experiment-id generation.
Unit Tests for Replacement Library Config
tests/unit/torch/puzzletron/test_replacement_library_bypass_config.py
Parameterized tests for dual metadata-filename support, symlink resolution, and bypass-row preference over pruned duplicates in dataframe.
GPU Tests for Checkpoint State Restoration
tests/gpu/torch/puzzletron/test_bypass_checkpoint_utils.py
Round-trip tests for GradScaler persistence, legacy checkpoint handling, and AdamW optimizer state restoration.
GPU Integration Tests for Bypass Distillation
tests/gpu/torch/puzzletron/test_bypass.py
Distributed end-to-end tests: block pruning, KV-head compression, multi-config sweeps, resume-from-checkpoint, subblock training modes, and bypass→replacement-library integration across model families.
Existing Test Refactoring
tests/gpu/torch/puzzletron/test_puzzletron.py
Uses shared PUZZLETRON_FAMILIES test constant instead of inline list.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as CLI / Hydra
  participant Converter as convert_puzzletron_model
  participant Checkpoint as _is_complete_anymodel_checkpoint
  participant HF as HF snapshot_download
  participant BypassDetector as _find_incomplete_bypass_runs
  Caller->>Converter: start conversion pipeline
  Converter->>Checkpoint: check teacher checkpoint completeness
  alt teacher incomplete
    Checkpoint->>HF: snapshot_download(hf_model_id)
  end
  Converter->>BypassDetector: enumerate expected bypass runs, load_bypass_state
  BypassDetector->>Converter: list of incomplete runs -> launch bypass if needed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • kevalmorabia97
  • AAnoosheh
  • shengliangxu
  • chadvoegele
🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Ssameni/puzzletron bypass 3 integration' is vague and lacks specificity about what changes are being made. Consider a more descriptive title that captures the main change, such as 'Integrate bypass distillation into Puzzletron pipeline' or similar.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected: no torch.load(weights_only=False), numpy.load(allow_pickle=True), hardcoded trust_remote_code=True, eval/exec, # nosec, or non-permissive dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ssameni/puzzletron-bypass-3-integration

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1470/

Built to branch gh-pages at 2026-06-09 09:25 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@Separius Separius force-pushed the ssameni/puzzletron-bypass-3-integration branch 4 times, most recently from 00c7d13 to a715b55 Compare May 12, 2026 14:39
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius Separius force-pushed the ssameni/puzzletron-bypass-3-integration branch from a715b55 to 9648a91 Compare June 8, 2026 08:06
Separius added 3 commits June 8, 2026 10:19
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius

Separius commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 3cb4f61

@Separius

Separius commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/claude review

@Separius Separius marked this pull request as ready for review June 8, 2026 09:26
@Separius Separius requested a review from a team as a code owner June 8, 2026 09:26
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/puzzletron/puzzletron_nas_plugin.py`:
- Around line 276-290: The in-function import "from huggingface_hub import
snapshot_download" inside the block that checks Path(input_model_path).exists()
lacks the required brief justification; add a short comment immediately above
that import (e.g., "Guard optional dependency: huggingface_hub is imported here
to avoid requiring it unless HF model auto-download is needed") to explain why
the import is local, keeping it next to the import in the same block where
input_model_path, snapshot_download, mprint, and Path are used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 65736e30-6d7d-46bb-bd1b-86ffc0900785

📥 Commits

Reviewing files that changed from the base of the PR and between 2c52e7b and 3cb4f61.

📒 Files selected for processing (21)
  • examples/puzzletron/Nemotron-3-Nano-30B-A3B-Base-BF16.md
  • examples/puzzletron/README.md
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/bypass/defaults.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/bypass/defaults.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/nemotron-3-nano-30b-a3b-with-bypass.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/nemotron-3-nano-30b-a3b.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/pruning/kv_heads_pruning.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/validate_model_defaults.yaml
  • examples/puzzletron/configs/nemotron-3-nano-30b-a3b/validate_solutions_defaults.yaml
  • examples/puzzletron/main.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/puzzletron_nas_plugin.py
  • modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
  • tests/gpu/torch/puzzletron/test_bypass.py
  • tests/gpu/torch/puzzletron/test_bypass_checkpoint_utils.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py
  • tests/unit/torch/puzzletron/test_puzzletron_nas_plugin.py
  • tests/unit/torch/puzzletron/test_puzzletron_progress.py
  • tests/unit/torch/puzzletron/test_replacement_library_bypass_config.py

Comment thread modelopt/torch/puzzletron/puzzletron_nas_plugin.py
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius Separius requested a review from kevalmorabia97 June 8, 2026 10:13
@Separius

Separius commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test e8b73f6

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>

# Conflicts:
#	examples/puzzletron/main.py
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 70.62500% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.30%. Comparing base (8b01ba4) to head (fbfa166).

Files with missing lines Patch % Lines
modelopt/torch/puzzletron/puzzletron_nas_plugin.py 68.30% 45 Missing ⚠️
...rch/puzzletron/bypass_distillation/bypass_utils.py 66.66% 1 Missing ⚠️
modelopt/torch/puzzletron/mip/run_puzzle.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1470       +/-   ##
===========================================
+ Coverage   56.40%   77.30%   +20.90%     
===========================================
  Files         506      507        +1     
  Lines       55486    55694      +208     
===========================================
+ Hits        31295    43056    +11761     
+ Misses      24191    12638    -11553     
Flag Coverage Δ
examples 41.35% <9.37%> (+22.77%) ⬆️
gpu 59.34% <55.62%> (+38.79%) ⬆️
regression 14.59% <0.00%> (-0.27%) ⬇️
unit 54.33% <45.00%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Separius added 2 commits June 9, 2026 10:02
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius

Separius commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test fbfa166

@Separius

Separius commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/claude review
Only review modelopt/torch/puzzletron folder changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant