-
Notifications
You must be signed in to change notification settings - Fork 433
Dkorzekwa/decilm cleanup post subblockstats #1103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
danielkorzekwa
merged 125 commits into
feature/puzzletron
from
dkorzekwa/decilm_cleanup_post_subblockstats
Mar 24, 2026
Merged
Changes from all commits
Commits
Show all changes
125 commits
Select commit
Hold shift + click to select a range
e82164f
Add anymodel directories to feature/puzzletron
danielkorzekwa 2099df3
Make any_model conversion working.
danielkorzekwa eb5cf8a
Update child_init.py with anymodel version
danielkorzekwa c9de41c
fix attention pruning
danielkorzekwa 3c1bc1f
Add trust_remote_code to load_model_config (default to false)
danielkorzekwa 8357136
Make activation scoring working
danielkorzekwa 6cc2194
Comment all tested models aside of llama_3_1_8b_instruct
danielkorzekwa ee4e1e3
Delete not needed decilm test
danielkorzekwa 449b523
Fix broken tests
danielkorzekwa fb27bba
Update puzzletron_nas_pluging to any_model version
danielkorzekwa b350f82
Correct test resources used by tests.
danielkorzekwa fafe5a3
Disable puzzletron tests (will be enabled after all any_model logic i…
danielkorzekwa e988248
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa c717852
Comment out not implemented models.
danielkorzekwa 030f126
format python docs
danielkorzekwa 8dcdfbf
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa 70df0df
Use trust_remote_code in force_cache_dynamic_modules()
danielkorzekwa bb56662
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa ecd953e
Fix anymodel pruning
danielkorzekwa ee8f538
Fix buid docs issue.
danielkorzekwa c9b76a1
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa 6e3af61
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa 0ad6d92
Merging build_library_and_stats
danielkorzekwa 995eb1a
Merging anymodel: calc_one_block_scores
danielkorzekwa 34081c9
Mering any_model: calc_one_block_scores
danielkorzekwa ed5c00f
merge any_model: mip_and_realize_models
danielkorzekwa 993b5ec
Add all anymodel models but gptoss
danielkorzekwa 6e9f03b
Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)
danielkorzekwa e8b7a7d
merge anymodel for nemotron-3-nano-30b-a3b-base-bf16
danielkorzekwa 47414d5
Clarify readme and avoid reusing the same reference in llama_converter.
danielkorzekwa a8305d8
Fix tied-embedding handling before writing the safetensors index.
danielkorzekwa 68421a5
Fix NaN ranking currently selects NaNs as “best” experts by default.
danielkorzekwa d6b8028
Code clean up.
danielkorzekwa ecd2341
Code clean up.
danielkorzekwa f9d845d
code clean up
danielkorzekwa d171b01
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa 722da90
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa 934ab2f
code clean up
danielkorzekwa 0f14ec3
Merge branch 'dkorzekwa/anymodel_pruning' into dkorzekwa/anymodel_bui…
danielkorzekwa dcb9e02
remove not needed comment
danielkorzekwa 0c9ea5d
Merge branch 'dkorzekwa/anymodel_build_library_and_stats' into dkorze…
danielkorzekwa 5b310e2
Merge branch 'dkorzekwa/any_model_calc_one_block_scores' into dkorzek…
danielkorzekwa 4f82b1c
Merge branch 'dkorzekwa/mip_and_realize_models' into dkorzekwa/any_mo…
danielkorzekwa 176a435
Fix a broken test_puzzletron test on 2 gpus.
danielkorzekwa 02e2c9b
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa 92c4419
Merge branch 'dkorzekwa/anymodel_pruning' into dkorzekwa/anymodel_bui…
danielkorzekwa aa1eb3e
Merge branch 'dkorzekwa/anymodel_build_library_and_stats' into dkorze…
danielkorzekwa 2b84a96
Merge branch 'dkorzekwa/any_model_calc_one_block_scores' into dkorzek…
danielkorzekwa fb838c0
Merge branch 'dkorzekwa/mip_and_realize_models' into dkorzekwa/any_mo…
danielkorzekwa 13378ff
Add gpt-oss model
danielkorzekwa 47ca0e3
Add comments about a broken test
danielkorzekwa 96112f7
Fix a broken gptoss test
danielkorzekwa cb6b182
Add mamba to puzzletron dependencies.
danielkorzekwa 670bb34
Update mamba-ssm and casual-conv1d dependences (remove pinpoint versi…
danielkorzekwa 0e1b591
Install mamba-ssm and causal-conv1d in testenv:cuda13-gpu-puzzletron
danielkorzekwa ca845ec
Fix installing dependencies in testenv:cuda13-gpu-puzzletron
danielkorzekwa be825bc
Fix anymodel for qwen3 8B in 2 gpus
danielkorzekwa 7fd1afa
Fix pipeline parallelism issue for wen3-vl-30b-a3b-instruct-qwen3_vl-…
danielkorzekwa 7d7b609
Fix multi-gpu issue for nemotron-nano-12b-v2
danielkorzekwa 249af9d
Fix no_op in any_model
danielkorzekwa b80583c
Merge branch 'feature/puzzletron' into dkorzekwa/any_model_other_models
danielkorzekwa 88b1b13
Merge any_model tutorial
danielkorzekwa c0da9c0
Merge mbridge distillation for any_model
danielkorzekwa 1dd742e
Fix nemotron_h_model_descriptor.
danielkorzekwa 4a6ebbe
Fix tox -e build-docs
danielkorzekwa 585f0ed
pin mamba/casual-conv1d versions to fix failing assertion for test_pu…
danielkorzekwa 7fb5d9a
Fix for installing mamba-ssm
danielkorzekwa 75d3d69
Fix broken test for nemotron-3-nano-30b-a3b-base-bf16
danielkorzekwa 0e5722d
code clean up
danielkorzekwa 2dd9735
Make test_puzzletron test deterministic
danielkorzekwa 3561de5
Comment out all models but nemotron-3-nano-30b-a3b-base-bf16 to check…
danielkorzekwa 27866de
Implement Qwen3VLRemoveExpertsIndependentHook
danielkorzekwa f5fbbcf
MR branch for the remaining difference between dkorzekwa/any_model an…
danielkorzekwa a012fe6
Remove not needed nvidia licence header
danielkorzekwa 52922a4
# Initialize weights to ensure all parameters are properly initialized
danielkorzekwa c234fb4
Fix non-deterministic test_puzzletron test
danielkorzekwa 53dcd10
Fix for unsetting CUDA_VISIBLE_DEVICES
danielkorzekwa 69d9648
increase numeric tolerance for test_puzzletron.py
danielkorzekwa 4a692dc
Disable lm_loss assertion for nemotron-3-nano-30b-a3b-base-bf16 (not …
danielkorzekwa e795f0c
Removing incorrect licence file. gpt_oss_pruned_to_mxfp4.py was not a…
danielkorzekwa 631306c
Fix hardcoded trust_remote_code
danielkorzekwa dc77be2
Merge branch 'dkorzekwa/any_model_other_models' into dkorzekwa/anymod…
danielkorzekwa b76e0ef
Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…
danielkorzekwa 109b185
Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…
danielkorzekwa b0972e4
Merge branch 'dkorzekwa/anymodel_mbridgedist' into dkorzekwa/remainin…
danielkorzekwa 5cadc65
Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_gptoss
danielkorzekwa 151081c
Delete not needed yaml files for test_puzzletron.
danielkorzekwa 36daa6d
Delete not needed mypy exclusion for removed hf_configs files.
danielkorzekwa 960b8ce
Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…
danielkorzekwa 854d96b
Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…
danielkorzekwa cf06997
Merge branch 'dkorzekwa/anymodel_mbridgedist' into dkorzekwa/remainin…
danielkorzekwa b47f846
Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_tutorial
danielkorzekwa 13f5edc
Merge branch 'dkorzekwa/anymodel_tutorial' into dkorzekwa/anymodel_mb…
danielkorzekwa b4c71cc
Merge branch 'dkorzekwa/anymodel_mbridgedist' into dkorzekwa/remainin…
danielkorzekwa 67444f4
Delete not used decilm dummy blocks and create_dummy_model()
danielkorzekwa 944f6f9
Delete not used decilm converters
danielkorzekwa 7ee045a
Delete not used decilm code
danielkorzekwa cd1bf88
removing decilm not used code.
danielkorzekwa e2fa0b3
Remove dead decilm code
danielkorzekwa fb48618
Delete megatron_lm_tokenizer
danielkorzekwa 5297a1c
Delete nemo export/import for decilm version of puzzletron
danielkorzekwa cbba0b0
Delete dead code.
danielkorzekwa e0fb3c1
Delete DeciLMForCausalLM
danielkorzekwa dbaab53
Remove unused save_checkpoint_as_symlinks()
danielkorzekwa 9c943fd
code clean up
danielkorzekwa 098d7c1
remove megatron_tokenizer
danielkorzekwa 5d0efa1
Delete copy_deci_lm_hf_code
danielkorzekwa ead68bb
Delete DeciLMPreTrainModel and DeciLMModel
danielkorzekwa 2d91afc
Delete not used code from replacement_library.py
danielkorzekwa 492cbaf
Delete not used decilm code
danielkorzekwa 1834c76
Delete not used decilm code
danielkorzekwa f096d11
remove dead replacement_library code
danielkorzekwa dc52a81
Delete not used transformers code
danielkorzekwa b9178a3
Delete unused decilm code
danielkorzekwa 9c496bb
Import clean up.
danielkorzekwa 467247a
Support moe in sweep.py
danielkorzekwa 034e77d
Code clean up.
danielkorzekwa 4bbdeaf
Add assertions for memory subblock stats
danielkorzekwa 837e14f
code clean up
danielkorzekwa c0a0cb0
Remove DeciLMMoe
danielkorzekwa 855f4a6
Update comments
danielkorzekwa 4458fb9
Code clean up
danielkorzekwa eae81a2
Remove dead code: DeciLMRMSNorm and DeciLMGatedMLP
danielkorzekwa ad369cc
Remove unused DeciLMConfig
danielkorzekwa 526f184
Merge branch 'feature/puzzletron' into dkorzekwa/decilm_cleanup_post_…
danielkorzekwa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
204 changes: 0 additions & 204 deletions
204
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge with
block_config.pyand move toany_modelfolder?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to TODO to not mess in this MR (large number of files would be changed)