feat: add config_cli.py and refactor configs + config pre-commit#1024
feat: add config_cli.py and refactor configs + config pre-commit#1024
Conversation
4534008 to
1c0290c
Compare
📝 WalkthroughWalkthroughAdds a config minimization/validation CLI (tools/config_cli.py) and pre-commit hooks to enforce minimized YAML recipes. Broadly rewrites multiple LLM/VLM recipe YAMLs to rely on shared defaults, remove redundant fields, normalize formatting, and simplify scheduler/optimizer/logging sections. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev
participant CLI as config_cli.py
participant FS as Filesystem
participant YAML as YAML Loader
Dev->>CLI: minimize --base BASE.yaml --child CHILD.yaml [-i]
CLI->>YAML: load(BASE.yaml)
CLI->>YAML: load(CHILD.yaml)
CLI->>CLI: resolve defaults (expand)
CLI->>CLI: compute minimized(CHILD - BASE)
alt in-place (-i)
CLI->>FS: write CHILD.yaml (defaults->relative, ordered)
else stdout
CLI-->>Dev: print minimized YAML
end
sequenceDiagram
autonumber
actor Git as pre-commit
participant Hook as minimize-check-*(bash)
participant CLI as config_cli.py
participant FS as Repo
Git->>Hook: on commit
loop for each (BASE, RECIPE)
Hook->>CLI: minimize-check --base BASE --child RECIPE
CLI->>FS: read files
CLI->>CLI: compute minimized form
alt differs
CLI-->>Hook: exit 1 (non-zero)
Hook-->>Git: fail hook
else identical
CLI-->>Hook: exit 0
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
260f022 to
1cd4fd9
Compare
|
tested on b171629 |
04cbee1 to
968afbd
Compare
Signed-off-by: Terry Kong <terryk@nvidia.com> compare command Signed-off-by: Terry Kong <terryk@nvidia.com> config changes Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "config changes" This reverts commit 25b87e2. Signed-off-by: Terry Kong <terryk@nvidia.com> cleanup Signed-off-by: Terry Kong <terryk@nvidia.com> vlm example Signed-off-by: Terry Kong <terryk@nvidia.com> minimize configs Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "minimize configs" This reverts commit 1375480. Signed-off-by: Terry Kong <terryk@nvidia.com> minimize configs Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "minimize configs" This reverts commit a4cd8a4. Signed-off-by: Terry Kong <terryk@nvidia.com> minimize configs Signed-off-by: Terry Kong <terryk@nvidia.com> force sft configs to use default chat template to match last releases behavior Signed-off-by: Terry Kong <terryk@nvidia.com> reverting select configs to v1 to address Signed-off-by: Terry Kong <terryk@nvidia.com> add pre-commit and add a minimize-check func Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "reverting select configs to v1 to address" This reverts commit d81f806. Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "force sft configs to use default chat template to match last releases" This reverts commit be01df7. Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "minimize configs" This reverts commit e54f144. Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
62be5ea to
a844f62
Compare
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
|
@terrykong thanks for this change, do you want to add a few unit tests for the new tool? |
Yea, that's a good point. I created an issue for those: #1201. I only planned on minimizing the recipes since those have tests |
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
|
@parthchadha added tests in 044385c |
…DIA-NeMo#1024) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
…DIA-NeMo#1024) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Convergence/Perf check
pr1024-convergence.log
Overview
Given the recent friction of merging can sometimes come down to config conflicts, this PR introduces a tool to merge configs (and pre-commit hook) so you can commit the minimal config necessary for a recipe. There is now a new risk of config changes in the base config propagating to the recipes, especially if the config was not defined and the new default changes the behavior, but this has the positive side-effect of bisecting being more helpful whereas before we were shielded from regressions until the recipe config was updated explicitly.
We need to now keep in mind that if a flag like "enable_eager" was false and no other config overrode it, turning it on would turn it on for all recipes.
Example of how to use the tool:
Related to #927
Notable issues that arose from merging configs:
1. all dtensor recipes defaulted to v2 and that uncovered some perf regressions
grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-longthis recipe is much worse on v2 than on v1 (cand=v1, base=v2). Due to this, i kept this recipe using v1

Issue tracking gemma perf regression #1097
2. SFT recipes defaulted to the default chat template recipe
Summary by CodeRabbit
New Features
Chores
Refactor
Style