Dkorzekwa/decilm cleanup post subblockstats#1103

Merged

danielkorzekwa merged 125 commits into

feature/puzzletronfrom

dkorzekwa/decilm_cleanup_post_subblockstats

Mar 24, 2026

danielkorzekwa commented Mar 23, 2026 •

edited by coderabbitai Bot

Loading

Contributor

What does this PR do?

Removing unused code from modelopt/torch/puzzletron/decilm/deci_lm_hf_code - completed.

Summary by CodeRabbit

Chores
- Removed vendored Transformers modules including activation functions, cache utilities, and configuration classes.
- Removed custom DeciLM configuration class.
- Updated internal type signatures from DeciLMConfig to standard Hugging Face PretrainedConfig.
- Simplified internal function signatures by removing unused parameters.
- Enhanced code quality checks via pre-commit configuration updates.

danielkorzekwa added 30 commits

March 4, 2026 11:33


          Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support
- Selective merge of anymodel functionality

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Make activation scoring working

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring


          Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring


          Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring


          Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring


          Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning


          Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Clarify readme and avoid reusing the same reference in llama_converter.

47414d5

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 13 commits

March 19, 2026 09:42


          remove dead replacement_library code

f096d11

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Delete not used transformers code

dc52a81

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Delete unused decilm code

b9178a3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Import clean up.

9c496bb

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Support moe in sweep.py

467247a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Code clean up.

034e77d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Add assertions for memory subblock stats

4bbdeaf

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          code clean up

837e14f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Remove DeciLMMoe

c0a0cb0

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Update comments

855f4a6

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Code clean up

4458fb9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Remove dead code: DeciLMRMSNorm and DeciLMGatedMLP

eae81a2

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


          Remove unused DeciLMConfig

ad369cc

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa requested review from a team as code owners

March 23, 2026 18:17

danielkorzekwa requested review from kevalmorabia97 and removed request for a team

March 23, 2026 18:17

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading

Contributor

📝 Walkthrough

Walkthrough

This PR removes vendored DeciLM-specific Hugging Face model code and configuration modules, replacing the custom DeciLMConfig class with the generic PretrainedConfig throughout the codebase.

Changes

Cohort / File(s)	Summary
Vendored HF Module Removals `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__.py`, `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__.py`, `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py`	Removed vendored copies of Hugging Face transformer modules including activation implementations (254 LOC), cache utilities (1447 LOC), LLaMA configuration (219 LOC), RoPE utilities (574 LOC), Llama4 configuration (447 LOC), and DeciLM configuration (204 LOC).
DeciLM Module Cleanup `modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py`	Removed `DeciLMRMSNorm` and `DeciLMGatedMLP` classes; retained only `LMHead` class definition with updated module documentation.
Type Signature Updates `modelopt/torch/puzzletron/tools/bypassed_training/child_init.py`, `modelopt/torch/puzzletron/replacement_library/replacement_library.py`, `modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`	Replaced `DeciLMConfig` with `PretrainedConfig` in type annotations and function signatures; removed `DeciLMConfig` imports.
Tooling & Documentation Updates `.pre-commit-config.yaml`, `modelopt/torch/puzzletron/tools/checkpoint_utils.py`, `modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`, `modelopt/torch/puzzletron/replacement_library/build_replacement_library.py`	Updated exclusion patterns to remove `transformers_.*\.py` from pre-commit hooks; updated docstrings and comments to reflect generic AnyModel/HF layouts rather than DeciLM-specific terminology; added `contextlib` import.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Dkorzekwa/decilm hf code cleanup #1071: Performs the same DeciLM/HF code cleanup by removing or refactoring vendored DeciLM HF modules and updating related tooling.

Suggested reviewers

kevalmorabia97

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Dkorzekwa/decilm cleanup post subblockstats' accurately describes the main objective of removing unused DeciLM-related code and artifacts after subblockstats work, as evidenced by the significant deletions across configuration and modeling files.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected. PR removes dead DeciLM code and updates type annotations without introducing torch.load(weights_only=False), numpy.load(allow_pickle=True), trust_remote_code=True, eval/exec on untrusted input, or # nosec bypasses.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dkorzekwa/decilm_cleanup_post_subblockstats

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kevalmorabia97 reviewed

View reviewed changes

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py

kevalmorabia97 Mar 23, 2026 •

edited

Loading

Collaborator

Can we merge with block_config.py and move to any_model folder?

danielkorzekwa Mar 24, 2026

Contributor Author

added to TODO to not mess in this MR (large number of files would be changed)

Base automatically changed from dkorzekwa/anymodel_subblock_stats_nodecilm to feature/puzzletron

March 24, 2026 11:17

danielkorzekwa requested a review from a team as a code owner

March 24, 2026 11:17

danielkorzekwa requested a review from kevalmorabia97

March 24, 2026 11:17


          Merge branch 'feature/puzzletron' into dkorzekwa/decilm_cleanup_post_…

526f184

…subblockstats

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai Bot reviewed

View reviewed changes

coderabbitai Bot left a comment

Contributor

🧹 Nitpick comments (1)

modelopt/torch/puzzletron/tools/checkpoint_utils.py (1)
136-137: Consider renaming is_valid_decilm_checkpoint to match the new generic semantics.
Now that the function validates AnyModel-style layout rather than DeciLM-specific format, a neutral name (or alias) would reduce API confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 136 - 137,
The helper is misnamed for its new generic semantics: rename the function
is_valid_decilm_checkpoint to a neutral name like is_valid_anymodel_checkpoint
or is_valid_model_checkpoint (keeping the same signature including
trust_remote_code) and update its docstring to reflect "AnyModel / puzzletron
layout" rather than DeciLM; to avoid breaking callers, add a short compatibility
alias that points the old name to the new function (e.g.,
is_valid_decilm_checkpoint = is_valid_anymodel_checkpoint) and update any
internal references to call the new name.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py`:
- Around line 136-137: The helper is misnamed for its new generic semantics:
rename the function is_valid_decilm_checkpoint to a neutral name like
is_valid_anymodel_checkpoint or is_valid_model_checkpoint (keeping the same
signature including trust_remote_code) and update its docstring to reflect
"AnyModel / puzzletron layout" rather than DeciLM; to avoid breaking callers,
add a short compatibility alias that points the old name to the new function
(e.g., is_valid_decilm_checkpoint = is_valid_anymodel_checkpoint) and update any
internal references to call the new name.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb5d0d2e-e8f7-49ad-a02d-35563718bb57

📥 Commits

Reviewing files that changed from the base of the PR and between 3193f30 and 526f184.

📒 Files selected for processing (14)

.pre-commit-config.yaml
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__activations.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__cache_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__configuration_llama.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_rope_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__configuration_llama4.py
modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
modelopt/torch/puzzletron/replacement_library/replacement_library.py
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
modelopt/torch/puzzletron/tools/checkpoint_utils.py
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py

💤 Files with no reviewable changes (6)

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__configuration_llama4.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__configuration_llama.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__cache_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__activations.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_rope_utils.py

codecov Bot commented Mar 24, 2026 •

edited

Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (3193f30) to head (526f184).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files

@@                  Coverage Diff                   @@
##           feature/puzzletron    #1103      +/-   ##
======================================================
+ Coverage               72.10%   72.12%   +0.02%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
+ Hits                    17036    17042       +6     
+ Misses                   6592     6586       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 approved these changes

View reviewed changes

danielkorzekwa merged commit 928036e into feature/puzzletron

28 checks passed

danielkorzekwa deleted the dkorzekwa/decilm_cleanup_post_subblockstats branch

March 24, 2026 13:10

danielkorzekwa mentioned this pull request

Merge puzzletron compression algorithm #1121

Merged

kevalmorabia97 added a commit that referenced this pull request


          Merge puzzletron compression algorithm (#1121)

361f7e3

### What does this PR do?

Implement puzzletron compression algorithm based on Puzzle paper
(https://arxiv.org/abs/2411.19146)

<details>
<summary> Th list of reviewed and merged MRs that resulted in the
feature/puzzletron branch</summary>

Merging dkorzekwa/any_model to feature/puzzletron

[Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull
Request #974 ·
NVIDIA/Model-Optimizer](#974)
- merged

[Draft: anymodel activation scoring by danielkorzekwa · Pull Request
#989 ·
NVIDIA/Model-Optimizer](#989)
- merged

[Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 ·
NVIDIA/Model-Optimizer](#990)
- merged

[Draft: Merging anymodel:build_library_and_stats by danielkorzekwa ·
Pull Request #993 ·
NVIDIA/Model-Optimizer](#993)
- merged

[Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull
Request #994 ·
NVIDIA/Model-Optimizer](#994)
- merged

[Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull
Request #995 ·
NVIDIA/Model-Optimizer](#995)
- merged

[Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request
#1007 ·
NVIDIA/Model-Optimizer](#1007)
- merged

PR to 1007: #1039 - merged

[Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 ·
NVIDIA/Model-Optimizer](#1020)
- merged

[Merge any_model tutorial by danielkorzekwa · Pull Request #1035 ·
NVIDIA/Model-Optimizer](#1035)
- merged

[Merge mbridge distillation for any_model by danielkorzekwa · Pull
Request #1036 ·
NVIDIA/Model-Optimizer](#1036)
- merged

[MR branch for the remaining difference between dkorzekwa/any_model an…
by danielkorzekwa · Pull Request #1047 ·
NVIDIA/Model-Optimizer](#1047)
- merged

[Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071
·
NVIDIA/Model-Optimizer](#1071)
- merged

[Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request
#1073 ·
NVIDIA/Model-Optimizer](#1073)
- merged

[Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request
#1085 ·
NVIDIA/Model-Optimizer](#1085)
- merged

[Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull
Request #1102 ·
NVIDIA/Model-Optimizer](#1102)
- merged

[Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull
Request #1103 ·
NVIDIA/Model-Optimizer](#1103)
- merged

[code clean up by danielkorzekwa · Pull Request #1110 ·
NVIDIA/Model-Optimizer](#1110)
- merged

Merging into main:

[Activation hooks redesign (reuse hooks component across both minitron
and puzzletron) by danielkorzekwa · Pull Request #1022 ·
NVIDIA/Model-Optimizer](#1022)
- merged

[Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa
· Pull Request #1115 ·
NVIDIA/Model-Optimizer](#1115)
- merged

</details>

<!-- Details about the change. -->

### Usage

Puzzletron tutorial:

https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron

### Testing
The main e2e test for compressing 9 models with Puzzletron:

https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py

2-gpu nightly tests: 

-
https://github.com/NVIDIA/Model-Optimizer/actions/runs/24468209205/job/71501061203
-
https://github.com/NVIDIA/Model-Optimizer/actions/runs/24470214159/job/71508152952

### Before your PR is "*Ready for review*"
- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added Puzzletron: end-to-end heterogeneous pruning & NAS workflow with
AnyModel support, example pipelines, deployment and evaluation
utilities, and tools for converting/pruning and exporting compressed
checkpoints.

* **Documentation**
* Comprehensive Puzzletron tutorials, model-specific guides, evaluator
instructions, example configs, and changelog entry.

* **Chores**
* CI/workflow updates (extras installation, longer GPU test timeout),
pre-commit hook exclusion updated, and CODEOWNERS entries added.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: root <root@pool0-00848.cm.cluster>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet