Dkorzekwa/decilm hf code cleanup 2#1073
Conversation
- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
📝 WalkthroughWalkthroughThe PR removes extensive DeciLM-specific model components (attention mechanisms, RoPE embeddings, Mamba mixer, decoder layers) across multiple utility modules and refactors the replacement library to adopt a generic AnyModel loading approach instead of DeciLM-specific layer replacement discovery and caching. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…up_2 Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)
159-163:⚠️ Potential issue | 🟡 MinorMissing return value when no weight paths are found.
_get_arbitrary_checkpoint_dirimplicitly returnsNoneif alllayer_replacement["weight_paths"]are empty. The callerget_arbitrary_checkpoint_dirat line 156 would cacheNone, and subsequent uses (e.g., line 83, 98, 119) would fail when treating it as aPath. Consider raising an explicit error or returning a sentinel.🐛 Proposed fix to raise explicit error
def _get_arbitrary_checkpoint_dir(self) -> Path: for layer_replacement in self.replacement_library: weight_paths = layer_replacement["weight_paths"] if len(weight_paths) > 0: return weights_path_to_checkpoint_dir(weight_paths[0]) + raise ValueError("No checkpoint directory found: all layer replacements have empty weight_paths")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around lines 159 - 163, The helper _get_arbitrary_checkpoint_dir currently falls through and returns None when every layer_replacement["weight_paths"] is empty, which causes get_arbitrary_checkpoint_dir to cache None and later code to treat it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths are found or alternatively return a clear sentinel Path, so callers like get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally receive None; locate the function _get_arbitrary_checkpoint_dir in replacement_library and add the explicit error path after iterating layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir for valid returns.modelopt/torch/puzzletron/replacement_library/replacement_utils.py (1)
88-108:⚠️ Potential issue | 🟡 MinorAdd a guard to check if
block_configsexists before accessing it.Line 94 directly accesses
teacher_model_config.block_configs[block_idx]without verifying the attribute exists. Whileload_model_config()(which creates the config passed to this function) useshasattr(config, "block_configs")before processing, it does not guarantee the attribute is present on all configs. The codebase shows this is a real concern — comments note that vision-language models can have nested configs withoutblock_configsat the top level, and multiple files use defensivehasattrchecks (e.g.,utils/parsing.py,checkpoint_utils_hf.py). This will raiseAttributeErrorif a config withoutblock_configsis passed. Add a guard like other parts of the codebase do, or document the specific config type required and type-annotate accordingly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py` around lines 88 - 108, The function is_replacement_identical_to_teacher accesses teacher_model_config.block_configs directly and can raise AttributeError for configs without that attribute; add a guard at the start of the branch that verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a default) before indexing into teacher_model_config.block_configs[block_idx], and if missing return False (or otherwise short-circuit) so the subsequent comparisons involving BlockConfig (replacement_block_config, parallel_blocks, parallel_blocks[0].attention/ffn) only run when block_configs exists and is indexable.
🧹 Nitpick comments (1)
modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)
78-88: Update return type annotation fromDeciLMConfigtoPretrainedConfig.The
model_configproperty still declares-> DeciLMConfigbutload_model_configcan return anyPretrainedConfig. This is inconsistent with the changes inreplacement_utils.pywhere function signatures were updated to acceptPretrainedConfig.♻️ Proposed fix
`@property` - def model_config(self) -> DeciLMConfig: + def model_config(self) -> PretrainedConfig: if self._model_config is None: trust_remote_code = self.descriptor.requires_trust_remote_code()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around lines 78 - 88, The model_config property currently types its return as DeciLMConfig but load_model_config can return any PretrainedConfig; update the annotation on the property (model_config) from -> DeciLMConfig to -> PretrainedConfig and also adjust any related attribute typing (self._model_config) and imports so they use PretrainedConfig instead of DeciLMConfig; ensure references in this class (e.g., model_config_overrides, get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code, load_model_config) remain unchanged except for the type switch to PretrainedConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 159-163: The helper _get_arbitrary_checkpoint_dir currently falls
through and returns None when every layer_replacement["weight_paths"] is empty,
which causes get_arbitrary_checkpoint_dir to cache None and later code to treat
it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a
descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths
are found or alternatively return a clear sentinel Path, so callers like
get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally
receive None; locate the function _get_arbitrary_checkpoint_dir in
replacement_library and add the explicit error path after iterating
layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir
for valid returns.
In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py`:
- Around line 88-108: The function is_replacement_identical_to_teacher accesses
teacher_model_config.block_configs directly and can raise AttributeError for
configs without that attribute; add a guard at the start of the branch that
verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a
default) before indexing into teacher_model_config.block_configs[block_idx], and
if missing return False (or otherwise short-circuit) so the subsequent
comparisons involving BlockConfig (replacement_block_config, parallel_blocks,
parallel_blocks[0].attention/ffn) only run when block_configs exists and is
indexable.
---
Nitpick comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 78-88: The model_config property currently types its return as
DeciLMConfig but load_model_config can return any PretrainedConfig; update the
annotation on the property (model_config) from -> DeciLMConfig to ->
PretrainedConfig and also adjust any related attribute typing
(self._model_config) and imports so they use PretrainedConfig instead of
DeciLMConfig; ensure references in this class (e.g., model_config_overrides,
get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code,
load_model_config) remain unchanged except for the type switch to
PretrainedConfig.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: de36f048-aec3-4535-88ef-e97851883a16
📒 Files selected for processing (13)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_outputs.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__cache_utils.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.pymodelopt/torch/puzzletron/replacement_library/replacement_library.pymodelopt/torch/puzzletron/replacement_library/replacement_utils.py
💤 Files with no reviewable changes (8)
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py
- modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1073 +/- ##
======================================================
+ Coverage 72.12% 72.13% +0.01%
======================================================
Files 209 209
Lines 23628 23628
======================================================
+ Hits 17042 17045 +3
+ Misses 6586 6583 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
### What does this PR do? Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146) <details> <summary> Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch</summary> Merging dkorzekwa/any_model to feature/puzzletron [Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer](#974) - merged [Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer](#989) - merged [Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer](#990) - merged [Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer](#993) - merged [Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer](#994) - merged [Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer](#995) - merged [Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer](#1007) - merged PR to 1007: #1039 - merged [Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer](#1020) - merged [Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer](#1035) - merged [Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer](#1036) - merged [MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer](#1047) - merged [Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer](#1071) - merged [Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer](#1073) - merged [Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer](#1085) - merged [Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer](#1102) - merged [Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer](#1103) - merged [code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer](#1110) - merged Merging into main: [Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer](#1022) - merged [Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer](#1115) - merged </details> <!-- Details about the change. --> ### Usage Puzzletron tutorial: https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron ### Testing The main e2e test for compressing 9 models with Puzzletron: https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py 2-gpu nightly tests: - https://github.com/NVIDIA/Model-Optimizer/actions/runs/24468209205/job/71501061203 - https://github.com/NVIDIA/Model-Optimizer/actions/runs/24470214159/job/71508152952 ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added Puzzletron: end-to-end heterogeneous pruning & NAS workflow with AnyModel support, example pipelines, deployment and evaluation utilities, and tools for converting/pruning and exporting compressed checkpoints. * **Documentation** * Comprehensive Puzzletron tutorials, model-specific guides, evaluator instructions, example configs, and changelog entry. * **Chores** * CI/workflow updates (extras installation, longer GPU test timeout), pre-commit hook exclusion updated, and CODEOWNERS entries added. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: root <root@pool0-00848.cm.cluster> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com> Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
What does this PR do?
Delete not used decilm code
Summary by CodeRabbit