Skip to content

Split bypass prerequisites#1468

Merged
Separius merged 23 commits into
mainfrom
ssameni/puzzletron-bypass-1-prereqs
May 29, 2026
Merged

Split bypass prerequisites#1468
Separius merged 23 commits into
mainfrom
ssameni/puzzletron-bypass-1-prereqs

Conversation

@Separius

@Separius Separius commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

This is PR 1 of 3 in the Puzzletron bypass/local-distillation stack.

This PR contains prerequisite infrastructure only. It does not wire bypass distillation into the Puzzletron pipeline yet.

Stack:

  1. This PR: shared prerequisites
  2. ssameni/puzzletron-bypass-2-core: bypass distillation core
  3. ssameni/puzzletron-bypass-3-integration: Puzzletron integration, configs, docs, GPU coverage

What Changed

  • Added ModelDescriptor.pruning_mixins() so model families can expose pruning mixins needed by downstream bypass initialization.
  • Added KV-head pruning mixin support for GPT-OSS, Nemotron-H, Nemotron-H-v2, and Qwen3-VL descriptors.
  • Improved pruning utilities for nested language-model configs and missing attention bias config fields.
  • Added create_train_dataloader() and streaming-safe shuffle handling.
  • Added chat-template fallback for base models without tokenizer.chat_template.
  • Added Sewing Kit loss/helper exports needed by the later bypass core.
  • Updated child-state initialization to support composing multiple pruning mixins.
  • Updated warmup-step resolver to account for gradient accumulation.

Why

The bypass distillation MR needs these reusable pieces, but they are independently reviewable and useful without adding the bypass
training stage itself.

Splitting them out keeps the bypass core PR focused on the actual local-distillation engine.

Tests

Added focused unit coverage for:

  • Dataloader behavior
  • Bypass loss helpers
  • KV-head pruning utilities
  • Sewing Kit activity/input/function/needle behavior

Summary by CodeRabbit

  • New Features

    • KV-head pruning added for multiple model families; generic pruning mixin hook available.
    • New training dataloader factory for infinite, block-sized training streams.
    • Vectorwise and batched normalized MSE loss utilities.
  • Improvements

    • Loss reports now show Δ-from-initial and visual indicators.
    • Chat-sample preprocessing tolerates tokenizers without chat templates.
    • More robust head-dimension/bias handling and grad-accum-aware warmup resolver.
  • Tests

    • Extensive unit tests added across dataloaders, losses, pruning, hydra utils, and sewing-kit.

Review Change Stack

@copy-pr-bot

copy-pr-bot Bot commented May 12, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extends the pruning framework with KV-heads support across multiple model descriptors, adds LM-config helpers and sequential multi-mixin application, introduces normalized MSE loss utilities, adds a training dataloader factory with tokenizer-aware chat preprocessing, updates stitched-loss formatting and warmup resolver behavior, and adds comprehensive unit tests.

Changes

Pruning and model descriptor enhancements

Layer / File(s) Summary
Base pruning mixin interface and language-model config utilities
modelopt/torch/puzzletron/anymodel/model_descriptor/base.py, modelopt/torch/puzzletron/pruning/pruning_utils.py
Adds ModelDescriptor.pruning_mixins() extension point; introduces _lm_attrs() and _lm_head_dim() to extract language-model sub-configs and head_dim for VL configs; updates _init_attention_weights()/_init_attention_biases()to use LM metadata with robust bias-key probing; addsMlpInitMode.MoEChannelPruning`.
KV-heads pruning across model descriptors
modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py, modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py, modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py, modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py, modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
KVHeadsPruningMixIn derives head size via _lm_head_dim(); GPT-OSS, NemotronH, NemotronHV2, and Qwen3VL descriptors register kv_heads pruning mixins and export model-specific KVHeadsLayerDescriptor dataclasses; expert-removal mixins registered (including legacy alias where present).
Sequential mixin composition and config override
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
_process_single_layer() supports lists of pruning mixins applied sequentially, threading interim parent/new state and per-layer key views, merging per-mixin layer outputs and aggregating keys_to_remove; update_model_config.override() treats None as leave-unchanged.`

Training infrastructure and loss utilities

Layer / File(s) Summary
Normalized MSE loss functions
modelopt/torch/puzzletron/sewing_kit/utils.py, tests/unit/torch/puzzletron/test_bypass_losses.py
Re-exports normalized_mse_loss; adds vectorwise_normalized_mse_loss() and batched_normalized_mse_loss() with batch-dim validation, epsilon-stabilized relative-L2 normalization, and mean-per-batch aggregation; tests cover identity, randomness, reduction modes, scale invariance, zero-target finiteness, and error cases.
Training dataloader factory
modelopt/torch/puzzletron/utils/data/dataloaders.py, modelopt/torch/puzzletron/utils/data/dataset.py, tests/unit/torch/puzzletron/test_bypass_dataloaders.py
create_train_dataloader() builds an infinite DataLoader backed by ConstantLengthDataset, rejects num_workers>0, supports streaming vs map shuffle, and wraps training split; ConstantLengthDataset.__iter__ uses tokenizer.apply_chat_template() when available or falls back to normalized newline-joined message content; tests validate materialization, padding, collation, Printer contract, loader delegation, and validation split auto-selection.
Configuration formatting and warmup computation
modelopt/torch/puzzletron/tools/hydra_utils.py, modelopt/torch/puzzletron/utils/parsing.py
warmup_steps() now requires grad_acc and validates inputs; _warmup_steps_resolver() supports 3/4/5-argument calls and is registered for Hydra; format_stitched_losses() accepts initial_values_dict and not_trainable_names, renders "Δ from initial", filters stats to finite values, and appends skipped count; formatters updated with emoji/bullet-style rendering.

Sewing kit infrastructure

Layer / File(s) Summary
Sewing kit module exports and comprehensive tests
modelopt/torch/puzzletron/sewing_kit/passage.py, tests/unit/torch/puzzletron/*
always_true_predicate exported from passage.py; extensive tests added for ActivityContext (stack semantics), Needle graph/validation, FunctionTarget kwargs-only dispatch, InputArgs behavior, pruning mixin composition and key-tracking, KV-head helper, hydra warmup validation, dataloader behavior, and loss formatting/utilities.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Split bypass prerequisites' is vague and does not clearly convey the main changes in this substantial PR. Consider a more specific title that highlights key changes, such as 'Add pruning mixins, KV-head pruning support, and bypass prerequisites' or 'Add ModelDescriptor.pruning_mixins() and KV-head pruning infrastructure'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No SECURITY.md anti-patterns found: no new torch.load/numpy.load/trust_remote_code/eval/exec/nosec/restricted-dependencies additions detected in PR changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ssameni/puzzletron-bypass-1-prereqs

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-29 12:02 UTC

@Separius Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from 566cb1d to 0639883 Compare May 12, 2026 10:51
@Separius

Copy link
Copy Markdown
Contributor Author

/claude review

@claude

claude Bot commented May 12, 2026

Copy link
Copy Markdown

Claude review — summary

Findings: CRITICAL: 1 · IMPORTANT: 2 · SUGGESTION: 2

Most impactful

  • CRITICAL: tests/unit/torch/puzzletron/test_bypass_losses.py::test_format_stitched_losses_keeps_trainable_nan_visible calls format_stitched_losses(...) with initial_values_dict= and not_trainable_names= kwargs that don't exist in the function's current signature (and asserts on output strings like "Skipped=1" / "non-finite" that the implementation never produces). This test will hard-fail at collection/call time (TypeError). Either bring the format_stitched_losses update forward into this PR or defer this single test to the bypass-core PR.
  • IMPORTANT: The multi-mixin composition in child_init.py:_process_single_layer uses last-writer-wins semantics via dict.update, despite the comment claiming ordering can't corrupt the state dict. Two mixins that ever touch the same key will silently clobber each other. Either tighten the comment or add an overlap assertion.
  • IMPORTANT: override(item, None) in child_init.py:update_model_config now returns item instead of None. This is a sensible fix if None means "no override," but it's a behavior change — any caller that deliberately cleared a field with None now keeps the old value. Worth verifying no internal recipes/configs depended on the old semantics.

Risk level

Moderate. The bulk of the PR is cleanly scoped prerequisite plumbing (descriptor mixins, dataloader, chat-template fallback, warmup-step grad-accum handling, re-exports) with good test coverage for the pure-function helpers. The blocker is the one test that presupposes function-signature changes shipping in the follow-up PR — that needs to be resolved before merge. The mixin-composition and override-None semantics deserve a second look but aren't blockers.

Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated
Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated
Comment thread modelopt/torch/puzzletron/tools/hydra_utils.py Outdated
Comment thread modelopt/torch/puzzletron/pruning/pruning_utils.py Outdated
@Separius Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from 0639883 to a79fbae Compare May 12, 2026 11:19
@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.33333% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.11%. Comparing base (7ae1865) to head (11c1eea).

Files with missing lines Patch % Lines
...h/puzzletron/tools/bypassed_training/child_init.py 80.43% 9 Missing ⚠️
modelopt/torch/puzzletron/pruning/pruning_utils.py 78.26% 5 Missing ⚠️
modelopt/torch/puzzletron/utils/parsing.py 86.84% 5 Missing ⚠️
modelopt/torch/puzzletron/utils/data/dataset.py 91.30% 2 Missing ⚠️
...torch/puzzletron/anymodel/model_descriptor/base.py 66.66% 1 Missing ⚠️
modelopt/torch/puzzletron/sewing_kit/utils.py 95.65% 1 Missing ⚠️
modelopt/torch/puzzletron/tools/hydra_utils.py 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1468      +/-   ##
==========================================
+ Coverage   76.53%   77.11%   +0.57%     
==========================================
  Files         478      478              
  Lines       52027    52209     +182     
==========================================
+ Hits        39821    40263     +442     
+ Misses      12206    11946     -260     
Flag Coverage Δ
examples 41.64% <24.00%> (+8.80%) ⬆️
gpu 59.44% <42.66%> (-0.64%) ⬇️
regression 15.18% <0.00%> (+0.01%) ⬆️
unit 53.52% <84.88%> (+0.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Separius Separius marked this pull request as ready for review May 12, 2026 11:32
@Separius Separius requested a review from a team as a code owner May 12, 2026 11:32
@Separius

Copy link
Copy Markdown
Contributor Author

@AAnoosheh and @kevalmorabia97 ready for review (split the bypass MR into 3, this is the first one, nothing too important, just some preparations and tiny fixes)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tests/unit/torch/puzzletron/test_bypass_dataloaders.py (1)

206-219: ⚡ Quick win

Add a direct test for ConstantLengthDataset chat-template fallback

This fixture replaces ConstantLengthDataset, so the new no-chat_template preprocessing path in ConstantLengthDataset.__iter__ is not exercised. A small targeted iterator test would close that regression gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/puzzletron/test_bypass_dataloaders.py` around lines 206 -
219, The fixture patches out ConstantLengthDataset so
ConstantLengthDataset.__iter__'s new no-chat_template fallback isn't tested; add
a small unit test that imports the real ConstantLengthDataset (not
_FakeConstantLengthDataset), constructs it with a tiny dataset whose items lack
"chat_template", iterates it (e.g., list(ConstantLengthDataset(...)) or calling
its __iter__), and asserts the output matches the expected realized items (e.g.,
tensors like {"input_ids": torch.tensor([0])}); ensure this test does not apply
the patched_dataloader monkeypatch and references ConstantLengthDataset and
ConstantLengthDataset.__iter__ (and optionally create_validation_dataloader) so
the fallback path is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 479-495: The function batched_normalized_mse_loss allows silent
broadcasting when input and target shapes differ; add explicit shape validation
at the top of the function: verify input.ndim == target.ndim, confirm batch_dims
are valid indices, and ensure sizes match for every dimension (both batch dims
and non-batch dims computed via norm_dims) so that target and input are exactly
compatible; if any mismatch, raise a ValueError with a clear message that
includes the shapes of input and target and the resolved batch_dims/norm_dims to
aid debugging.

In `@modelopt/torch/puzzletron/tools/bypassed_training/child_init.py`:
- Around line 93-95: The per-layer loop currently does full copies via
current_parent_state_dict = dict(parent_state_dict), current_new_state_dict =
dict(new_state_dict), current_keys = dict(keys) which is expensive; instead,
stop cloning entire mappings inside the loop and operate on the original dicts
(parent_state_dict, new_state_dict, keys) by reading values directly and only
materialize copies for individual tensors/entries that are actually modified
(e.g., when applying a mixin to a specific key). Locate the per-layer mixin loop
and replace the dict() copies with references to the originals, and when you
need to mutate a specific parameter, copy only that parameter (or its key->value
pair) and write back to new_state_dict; ensure any iteration over keys uses an
iterator or list(keys) outside the hot loop if necessary to avoid mutation
races.

In `@modelopt/torch/puzzletron/tools/hydra_utils.py`:
- Around line 35-50: The warmup_steps function must validate and normalize
inputs before doing integer divisions: ensure tokens, block, mbs and grad_accum
are ints (or cast) and that block>0, mbs>0, grad_accum>=1, and that pct is a
float within [0.0,1.0] (or at least >=0); raise ValueError with clear messages
for invalid values. In function warmup_steps, coerce tokens, block, mbs,
grad_accum and pct to the expected types up front, check block and mbs are >0 to
avoid ZeroDivisionError, check grad_accum>=1 (existing check can be reused), and
validate pct (and tokens>=0) before computing iters/steps and returning the
rounded warmup steps.

In `@modelopt/torch/puzzletron/utils/data/dataset.py`:
- Around line 131-138: The fallback that concatenates messages when
getattr(self.tokenizer, "chat_template", None) is None assumes every
m["content"] is a str and can raise TypeError for structured payloads; update
the else branch in dataset.py where sample is built to normalize each
m["content"] to a string before joining (e.g., if m["content"] is a dict or
other structured object, extract a text field if present like
m["content"].get("text") or otherwise call str(m["content"])), so the
concatenation in the no-template path (the code around tokenizer.chat_template
and tokenizer.apply_chat_template) always receives plain text.

In `@tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py`:
- Around line 137-139: The test currently checks values in received["kwargs"]
but doesn't ensure no extra kwargs are present; update the second-order test in
test_sewing_kit_function_target_kwargs (use the local variables received,
student_value, teacher_value) to assert that received["kwargs"] contains exactly
the keys "input" and "target" (e.g., compare set(received["kwargs"].keys()) to
{"input","target"}) before the existing torch.equal assertions, then keep the
existing checks for received["args"] and the tensor equality against
student_value and teacher_value.

---

Nitpick comments:
In `@tests/unit/torch/puzzletron/test_bypass_dataloaders.py`:
- Around line 206-219: The fixture patches out ConstantLengthDataset so
ConstantLengthDataset.__iter__'s new no-chat_template fallback isn't tested; add
a small unit test that imports the real ConstantLengthDataset (not
_FakeConstantLengthDataset), constructs it with a tiny dataset whose items lack
"chat_template", iterates it (e.g., list(ConstantLengthDataset(...)) or calling
its __iter__), and asserts the output matches the expected realized items (e.g.,
tensors like {"input_ids": torch.tensor([0])}); ensure this test does not apply
the patched_dataloader monkeypatch and references ConstantLengthDataset and
ConstantLengthDataset.__iter__ (and optionally create_validation_dataloader) so
the fallback path is exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ddff5f0a-3633-4520-914f-dad472197cf8

📥 Commits

Reviewing files that changed from the base of the PR and between 7a11fb2 and a79fbae.

📒 Files selected for processing (22)
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • tests/unit/torch/puzzletron/test_bypass_dataloaders.py
  • tests/unit/torch/puzzletron/test_bypass_losses.py
  • tests/unit/torch/puzzletron/test_child_init_mixins.py
  • tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
  • tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
  • tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
  • tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
  • tests/unit/torch/puzzletron/test_sewing_kit_needle.py

Comment thread modelopt/torch/puzzletron/sewing_kit/utils.py
Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated
Comment thread modelopt/torch/puzzletron/tools/hydra_utils.py Outdated
Comment thread modelopt/torch/puzzletron/utils/data/dataset.py Outdated
Comment thread tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py Outdated
Separius added 2 commits May 12, 2026 16:09
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from a79fbae to 12086fb Compare May 12, 2026 14:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 540-542: Validate that epsilon is strictly positive before
computing den; in the function that computes num = ((input - target) **
2).sum(dim=norm_dims) and den = (target**2).sum(dim=norm_dims) + epsilon, add a
guard at the start (before the denominator math) that either raises a ValueError
with a clear message if epsilon <= 0, or clamps epsilon to a small positive
floor (e.g., max(epsilon, 1e-12)); ensure the check references the epsilon
variable and occurs before computing den to prevent any inf/nan from division.

In `@modelopt/torch/puzzletron/utils/data/dataloaders.py`:
- Around line 113-121: The shuffle call for map-style datasets currently
hardcodes keep_in_memory=True and ignores the function argument; update the
branch that handles non-IterableDataset so that it passes the caller's
keep_in_memory parameter (the function arg named keep_in_memory) into
train_data.shuffle(seed=shuffle_seed, keep_in_memory=keep_in_memory) while
leaving IterableDataset.shuffle(seed=shuffle_seed) unchanged; reference the
symbols train_data, datasets.IterableDataset, shuffle_seed, and keep_in_memory
to locate and modify the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e8b4997-90ef-408e-b03d-7bb26b85189d

📥 Commits

Reviewing files that changed from the base of the PR and between a79fbae and 12086fb.

📒 Files selected for processing (23)
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • tests/unit/torch/puzzletron/test_bypass_dataloaders.py
  • tests/unit/torch/puzzletron/test_bypass_losses.py
  • tests/unit/torch/puzzletron/test_child_init_mixins.py
  • tests/unit/torch/puzzletron/test_hydra_utils.py
  • tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
  • tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
  • tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
  • tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
  • tests/unit/torch/puzzletron/test_sewing_kit_needle.py
✅ Files skipped from review due to trivial changes (3)
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
🚧 Files skipped from review as they are similar to previous changes (16)
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • tests/unit/torch/puzzletron/test_child_init_mixins.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
  • tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • tests/unit/torch/puzzletron/test_bypass_losses.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • tests/unit/torch/puzzletron/test_bypass_dataloaders.py
  • tests/unit/torch/puzzletron/test_sewing_kit_needle.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py

Comment thread modelopt/torch/puzzletron/sewing_kit/utils.py
Comment thread modelopt/torch/puzzletron/utils/data/dataloaders.py Outdated
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius

Copy link
Copy Markdown
Contributor Author

/claude review

@Separius

Copy link
Copy Markdown
Contributor Author

/claude review

Comment thread modelopt/torch/puzzletron/tools/hydra_utils.py Outdated
Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated
@Separius

Copy link
Copy Markdown
Contributor Author

@AAnoosheh ready for review

Comment thread modelopt/torch/puzzletron/pruning/pruning_utils.py Outdated
Separius added 6 commits May 20, 2026 10:45
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius

Copy link
Copy Markdown
Contributor Author

@kevalmorabia97 ready for review

@kevalmorabia97

Copy link
Copy Markdown
Collaborator

/ok to test e567f57

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius Separius removed the request for review from AAnoosheh May 28, 2026 11:08
@Separius Separius enabled auto-merge (squash) May 28, 2026 11:09
@Separius

Copy link
Copy Markdown
Contributor Author

/ok to test 3084194

@Separius

Copy link
Copy Markdown
Contributor Author

/ok to test dccf464

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius Separius requested a review from a team as a code owner May 29, 2026 08:03
@Separius Separius requested a review from jenchen13 May 29, 2026 08:03
@Separius

Copy link
Copy Markdown
Contributor Author

/ok to test eada923

@Separius

Copy link
Copy Markdown
Contributor Author

/ok to test 11c1eea

@Separius Separius merged commit a9c156e into main May 29, 2026
49 checks passed
@Separius Separius deleted the ssameni/puzzletron-bypass-1-prereqs branch May 29, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants