Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-03-19T17:30:16Z

What does this PR do?

Delete not used decilm code

Summary by CodeRabbit

Refactor
- Removed DeciLM-specific components including decoder layers, attention implementations, and specialized cache utilities, streamlining the codebase
- Updated replacement library to use generic model configurations instead of DeciLM-specific types, improving compatibility with diverse architectures
- Cleaned up internal utilities for attention masking, flash attention compatibility, and rotary position embeddings

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

…nymodel_pruning

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai · 2026-03-19T17:30:28Z

📝 Walkthrough

Walkthrough

The PR removes extensive DeciLM-specific model components (attention mechanisms, RoPE embeddings, Mamba mixer, decoder layers) across multiple utility modules and refactors the replacement library to adopt a generic AnyModel loading approach instead of DeciLM-specific layer replacement discovery and caching.

Changes

Cohort / File(s)	Summary
DeciLM Core Model Components `configuration_decilm.py`, `megatron_lm__mamba_mixer.py`, `modeling_decilm.py`	Removed fake import hook for `HybridChunkedCache`, deleted entire `MambaMixerMegatron` implementation (527 lines), and stripped `modeling_decilm.py` of HF/transformers-derived decoder stack including `DeciLMAttention`, `DeciLMFlashAttention2`, rotary embedding classes, decoder layers, causal mask utilities, and vanilla MLP; retained only `DeciLMRMSNorm`, `DeciLMGatedMLP`, `DeciLMMoe`, and `LMHead`.
Transformers Utility Modules `transformers_4_44_2__modeling_attn_mask_utils.py`, `transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py`, `transformers_4_44_2__pytorch_utils.py`, `transformers_4_51_3__modeling_llama4_attention.py`	Deleted four utility modules: mask construction helpers (498 lines), FlashAttention backward-compatibility utilities (363 lines), layernorm registry (32 lines), and Llama4 text attention with RoPE support (289 lines).
Inference Cache & Embeddings `variable_cache.py`, `vllm_yarn_utils.py`	Removed `VariableCache` class for per-layer KV-cache management (213 lines) and YaRN scaling rotary embedding module with position embedding logic (210 lines).
Replacement Library Refactoring `replacement_library.py`, `replacement_utils.py`	Refactored replacement library from DeciLM-specific checkpoint loading with layer caching (`load_checkpoint()` method, dtype/layer accessors, block getters) to generic `load_and_shard_model`-based AnyModel approach; renamed internal method `_get_arbitrary_block_checkpoint_paths()` to `_get_arbitrary_non_block_checkpoint_paths()`. Updated function signatures in `replacement_utils.py` to use `PretrainedConfig` instead of DeciLM-specific `DeciLMConfig`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Dkorzekwa/decilm hf code cleanup 2' partially relates to the changeset by referencing a cleanup effort, but it is vague and generic, lacking specificity about what was actually removed or changed.	Revise the title to be more specific about the primary change, such as 'Remove unused DeciLM HuggingFace utilities and replacement library code' or similar descriptive phrasing that clearly indicates the main cleanup scope.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns	✅ Passed	The PR does not introduce critical security anti-patterns specified in SECURITY.md. No unsafe torch.load/numpy.load/trust_remote_code/eval/exec patterns are added.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dkorzekwa/decilm_hf_code_cleanup_2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…up_2 Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)
159-163: ⚠️ Potential issue | 🟡 Minor

Missing return value when no weight paths are found.

_get_arbitrary_checkpoint_dir implicitly returns None if all layer_replacement["weight_paths"] are empty. The caller get_arbitrary_checkpoint_dir at line 156 would cache None, and subsequent uses (e.g., line 83, 98, 119) would fail when treating it as a Path. Consider raising an explicit error or returning a sentinel.
🐛 Proposed fix to raise explicit error
     def _get_arbitrary_checkpoint_dir(self) -> Path:
         for layer_replacement in self.replacement_library:
             weight_paths = layer_replacement["weight_paths"]
             if len(weight_paths) > 0:
                 return weights_path_to_checkpoint_dir(weight_paths[0])
+        raise ValueError("No checkpoint directory found: all layer replacements have empty weight_paths")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around
lines 159 - 163, The helper _get_arbitrary_checkpoint_dir currently falls
through and returns None when every layer_replacement["weight_paths"] is empty,
which causes get_arbitrary_checkpoint_dir to cache None and later code to treat
it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a
descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths
are found or alternatively return a clear sentinel Path, so callers like
get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally
receive None; locate the function _get_arbitrary_checkpoint_dir in
replacement_library and add the explicit error path after iterating
layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir
for valid returns.
modelopt/torch/puzzletron/replacement_library/replacement_utils.py (1)
88-108: ⚠️ Potential issue | 🟡 Minor

Add a guard to check if block_configs exists before accessing it.

Line 94 directly accesses teacher_model_config.block_configs[block_idx] without verifying the attribute exists. While load_model_config() (which creates the config passed to this function) uses hasattr(config, "block_configs") before processing, it does not guarantee the attribute is present on all configs. The codebase shows this is a real concern — comments note that vision-language models can have nested configs without block_configs at the top level, and multiple files use defensive hasattr checks (e.g., utils/parsing.py, checkpoint_utils_hf.py). This will raise AttributeError if a config without block_configs is passed. Add a guard like other parts of the codebase do, or document the specific config type required and type-annotate accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py` around
lines 88 - 108, The function is_replacement_identical_to_teacher accesses
teacher_model_config.block_configs directly and can raise AttributeError for
configs without that attribute; add a guard at the start of the branch that
verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a
default) before indexing into teacher_model_config.block_configs[block_idx], and
if missing return False (or otherwise short-circuit) so the subsequent
comparisons involving BlockConfig (replacement_block_config, parallel_blocks,
parallel_blocks[0].attention/ffn) only run when block_configs exists and is
indexable.

🧹 Nitpick comments (1)

modelopt/torch/puzzletron/replacement_library/replacement_library.py (1)

78-88: Update return type annotation from DeciLMConfig to PretrainedConfig.

The model_config property still declares -> DeciLMConfig but load_model_config can return any PretrainedConfig. This is inconsistent with the changes in replacement_utils.py where function signatures were updated to accept PretrainedConfig.

♻️ Proposed fix

     `@property`
-    def model_config(self) -> DeciLMConfig:
+    def model_config(self) -> PretrainedConfig:
         if self._model_config is None:
             trust_remote_code = self.descriptor.requires_trust_remote_code()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py` around
lines 78 - 88, The model_config property currently types its return as
DeciLMConfig but load_model_config can return any PretrainedConfig; update the
annotation on the property (model_config) from -> DeciLMConfig to ->
PretrainedConfig and also adjust any related attribute typing
(self._model_config) and imports so they use PretrainedConfig instead of
DeciLMConfig; ensure references in this class (e.g., model_config_overrides,
get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code,
load_model_config) remain unchanged except for the type switch to
PretrainedConfig.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 159-163: The helper _get_arbitrary_checkpoint_dir currently falls
through and returns None when every layer_replacement["weight_paths"] is empty,
which causes get_arbitrary_checkpoint_dir to cache None and later code to treat
it as a Path; update _get_arbitrary_checkpoint_dir to explicitly raise a
descriptive exception (e.g., RuntimeError or ValueError) when no weight_paths
are found or alternatively return a clear sentinel Path, so callers like
get_arbitrary_checkpoint_dir and code that uses its result cannot accidentally
receive None; locate the function _get_arbitrary_checkpoint_dir in
replacement_library and add the explicit error path after iterating
layer_replacement["weight_paths"], referencing weights_path_to_checkpoint_dir
for valid returns.

In `@modelopt/torch/puzzletron/replacement_library/replacement_utils.py`:
- Around line 88-108: The function is_replacement_identical_to_teacher accesses
teacher_model_config.block_configs directly and can raise AttributeError for
configs without that attribute; add a guard at the start of the branch that
verifies hasattr(teacher_model_config, "block_configs") (or use getattr with a
default) before indexing into teacher_model_config.block_configs[block_idx], and
if missing return False (or otherwise short-circuit) so the subsequent
comparisons involving BlockConfig (replacement_block_config, parallel_blocks,
parallel_blocks[0].attention/ffn) only run when block_configs exists and is
indexable.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/replacement_library/replacement_library.py`:
- Around line 78-88: The model_config property currently types its return as
DeciLMConfig but load_model_config can return any PretrainedConfig; update the
annotation on the property (model_config) from -> DeciLMConfig to ->
PretrainedConfig and also adjust any related attribute typing
(self._model_config) and imports so they use PretrainedConfig instead of
DeciLMConfig; ensure references in this class (e.g., model_config_overrides,
get_arbitrary_checkpoint_dir, descriptor.requires_trust_remote_code,
load_model_config) remain unchanged except for the type switch to
PretrainedConfig.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de36f048-aec3-4535-88ef-e97851883a16

📥 Commits

Reviewing files that changed from the base of the PR and between 110316a and d6ccd8f.

📒 Files selected for processing (13)

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_outputs.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__cache_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.py
modelopt/torch/puzzletron/replacement_library/replacement_library.py
modelopt/torch/puzzletron/replacement_library/replacement_utils.py

💤 Files with no reviewable changes (8)

modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.py

codecov · 2026-03-23T15:48:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.13%. Comparing base (110316a) to head (d6ccd8f).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files

@@                  Coverage Diff                   @@
##           feature/puzzletron    #1073      +/-   ##
======================================================
+ Coverage               72.12%   72.13%   +0.01%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
+ Hits                    17042    17045       +3     
+ Misses                   6586     6583       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

### What does this PR do? Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146) <details> <summary> Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch</summary> Merging dkorzekwa/any_model to feature/puzzletron [Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer](#974) - merged [Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer](#989) - merged [Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer](#990) - merged [Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer](#993) - merged [Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer](#994) - merged [Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer](#995) - merged [Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer](#1007) - merged PR to 1007: #1039 - merged [Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer](#1020) - merged [Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer](#1035) - merged [Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer](#1036) - merged [MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer](#1047) - merged [Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer](#1071) - merged [Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer](#1073) - merged [Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer](#1085) - merged [Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer](#1102) - merged [Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer](#1103) - merged [code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer](#1110) - merged Merging into main: [Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer](#1022) - merged [Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer](#1115) - merged </details>  ### Usage Puzzletron tutorial: https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron ### Testing The main e2e test for compressing 9 models with Puzzletron: https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py 2-gpu nightly tests: - https://github.com/NVIDIA/Model-Optimizer/actions/runs/24468209205/job/71501061203 - https://github.com/NVIDIA/Model-Optimizer/actions/runs/24470214159/job/71508152952 ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅  ## Summary by CodeRabbit * **New Features** * Added Puzzletron: end-to-end heterogeneous pruning & NAS workflow with AnyModel support, example pipelines, deployment and evaluation utilities, and tools for converting/pruning and exporting compressed checkpoints. * **Documentation** * Comprehensive Puzzletron tutorials, model-specific guides, evaluator instructions, example configs, and changelog entry. * **Chores** * CI/workflow updates (extras installation, longer GPU test timeout), pre-commit hook exclusion updated, and CODEOWNERS entries added.  --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: root <root@pool0-00848.cm.cluster> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com> Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

danielkorzekwa added 30 commits March 4, 2026 11:33

Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make activation scoring working

8357136

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring

Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring

Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring

Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring

Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning

Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Clarify readme and avoid reusing the same reference in llama_converter.

47414d5

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 15 commits March 18, 2026 11:57

Delete megatron_lm_tokenizer

fb48618

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete nemo export/import for decilm version of puzzletron

5297a1c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete dead code.

cbba0b0

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete DeciLMForCausalLM

e0fb3c1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove unused save_checkpoint_as_symlinks()

dbaab53

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

code clean up

9c943fd

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove megatron_tokenizer

098d7c1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete copy_deci_lm_hf_code

5d0efa1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete DeciLMPreTrainModel and DeciLMModel

ead68bb

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used code from replacement_library.py

2d91afc

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used decilm code

492cbaf

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used decilm code

1834c76

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove dead replacement_library code

f096d11

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not used transformers code

dc52a81

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete unused decilm code

b9178a3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa requested a review from a team as a code owner March 19, 2026 17:30

Import clean up.

9c496bb

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

kevalmorabia97 approved these changes Mar 19, 2026

View reviewed changes

Base automatically changed from dkorzekwa/decilm_hf_code_cleanup to feature/puzzletron March 23, 2026 13:58

danielkorzekwa requested review from a team as code owners March 23, 2026 13:58

danielkorzekwa requested a review from shengliangxu March 23, 2026 13:58

Merge branch 'feature/puzzletron' into dkorzekwa/decilm_hf_code_clean…

d6ccd8f

…up_2 Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

danielkorzekwa merged commit 4190275 into feature/puzzletron Mar 23, 2026
28 checks passed

danielkorzekwa deleted the dkorzekwa/decilm_hf_code_cleanup_2 branch March 23, 2026 17:24

danielkorzekwa mentioned this pull request Mar 25, 2026

Merge puzzletron compression algorithm #1121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dkorzekwa/decilm hf code cleanup 2#1073

Dkorzekwa/decilm hf code cleanup 2#1073
danielkorzekwa merged 116 commits into
feature/puzzletronfrom
dkorzekwa/decilm_hf_code_cleanup_2

danielkorzekwa commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielkorzekwa commented Mar 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielkorzekwa commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading

codecov Bot commented Mar 23, 2026 •

edited

Loading