feat: add config_cli.py and refactor configs + config pre-commit by terrykong · Pull Request #1024 · NVIDIA-NeMo/RL

terrykong · 2025-08-29T07:47:08Z

Convergence/Perf check

Overview

Given the recent friction of merging can sometimes come down to config conflicts, this PR introduces a tool to merge configs (and pre-commit hook) so you can commit the minimal config necessary for a recipe. There is now a new risk of config changes in the base config propagating to the recipes, especially if the config was not defined and the new default changes the behavior, but this has the positive side-effect of bisecting being more helpful whereas before we were shielded from regressions until the recipe config was updated explicitly.

We need to now keep in mind that if a flag like "enable_eager" was false and no other config overrode it, turning it on would turn it on for all recipes.

Example of how to use the tool:

Utilities for working with YAML configs in this repo.

Subcommands:
  - expand: Resolve a config with OmegaConf interpolation and inheritance.
  - minimize: Given a base config and a config, remove keys in the config that
    are equal to the base, and ensure a defaults entry pointing to the base
    exists. The defaults path in the resulting config is written relative to
    the base config file.
  - minimize-check: Same args as `minimize` but only checks if minimization
    would change the file; exits non-zero if changes are needed.

Both commands support printing to stdout or in-place editing of the config file.

Example:
  # Expand a config with a root level "defaults" key to see the full config; print to stdout
  tools/config_cli.py expand examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml

  # Expand a config with a root level "defaults" key to see the full config; edit the config in place
  tools/config_cli.py expand examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml --in-place

  # Minimize a config and remove all keys that are present in the base config; print to stdout
  # tools/config_cli.py minimize <base_config> <config>
  tools/config_cli.py minimize examples/configs/dpo.yaml examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml

  # Minimize a config and remove all keys that are present in the base config; edit the config in place
  # tools/config_cli.py minimize <base_config> <config>
  tools/config_cli.py minimize examples/configs/dpo.yaml examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml --in-place

  # Minimize all llm the configs:
  for algo in grpo dpo sft; do
    base_config=examples/configs/${algo}.yaml
    if [[ ${algo} == grpo ]]; then
      base_config=examples/configs/grpo_math_1B.yaml
    fi
    for recipe in examples/configs/recipes/llm/${algo}-*.yaml; do
      tools/config_cli.py minimize $base_config $recipe --in-place
    done
  done

  # Minimize vlm configs:
  for recipe in examples/configs/recipes/vlm/vlm_grpo-*.yaml; do
    tools/config_cli.py minimize examples/configs/vlm_grpo_3B.yaml $recipe --in-place
  done

  # Compare two configs
  tools/config_cli.py compare examples/configs/grpo_math_1B.yaml examples/configs/grpo_math_8B.yaml

  # Minimize a config and compare it to not minimzing (should be the same)
  tools/config_cli.py minimize examples/configs/dpo.yaml examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml >examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml.minimized
  tools/config_cli.py compare \
    examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml \
    examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml.minimized

Related to #927

Notable issues that arose from merging configs:

1. all dtensor recipes defaulted to v2 and that uncovered some perf regressions

`grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long`

this recipe is much worse on v2 than on v1 (cand=v1, base=v2). Due to this, i kept this recipe using v1

Issue tracking gemma perf regression #1097

2. SFT recipes defaulted to the default chat template recipe

Summary by CodeRabbit

New Features
- Introduced a configuration CLI with commands to expand, minimize, validate, and compare YAML configs.
- Added default chat templates to several SFT tokenizer settings.
Chores
- Added pre-commit hooks to enforce minimized configuration files with strict error handling.
Refactor
- Simplified and standardized numerous LLM/VLM recipe configs by delegating to shared defaults, trimming deprecated or redundant options, and restructuring schedulers.
Style
- Normalized YAML formatting (unquoted scalars, boolean/number formats), streamlined logger sections, and clarified dataset fields across configs.

coderabbitai · 2025-09-10T06:28:09Z

📝 Walkthrough

Walkthrough

Adds a config minimization/validation CLI (tools/config_cli.py) and pre-commit hooks to enforce minimized YAML recipes. Broadly rewrites multiple LLM/VLM recipe YAMLs to rely on shared defaults, remove redundant fields, normalize formatting, and simplify scheduler/optimizer/logging sections.

Changes

Cohort / File(s)	Summary
Tooling: Config CLI and CI hooks `./tools/config_cli.py`, `.pre-commit-config.yaml`	New CLI with expand/minimize/compare/minimize-check commands; vendored loader resolves defaults and prunes keys. Pre-commit adds local hooks to run minimize-check over specified base/recipe pairs; bash entrypoints with strict error handling.
LLM DPO recipes normalization `examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml`, `.../dpo-llama3.1-8b-instruct-4n8g-fsdp2tp4.yaml`, `.../dpo-llama3.1-8b-instruct-4n8g-megatron.v2.yaml`, `.../dpo-llama3.1-8b-instruct-4n8g-megatrontp2pp2-quick.yaml`, `.../dpo-llama3.1-8b-tulu3-1n8g-fsdp2tp1.yaml`, `.../dpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v2.yaml`, `.../dpo-mistral-nemo-instruct-2407-1n8g-fsdp2tp8-actckpt-long.yaml`	Add shared defaults; remove many training/validation fields; simplify dtensor/optimizer/scheduler/data/logger; unquote scalars; restructure scheduler lists.
LLM GRPO recipes normalization `examples/configs/recipes/llm/grpo-deepscaler-1.5b-{8K,16K,24K}.yaml`, `.../grpo-gemma3-{1b-it-1n8g-fsdp2tp1,27b-it-8n8g-fsdp2tp8-actckpt-long}.yaml`, `.../grpo-gspo-deepscaler-1.5b-8K.yaml`, `.../grpo-llama3.1-8b-instruct-{1n8g-megatron-fp8,2n8g-fsdp2tp1-noncolocated,4n8g-fsdp2tp1-long.v3}.yaml`, `.../grpo-llama3.2-1b-instruct-{1n8g-fsdp2tp1.v3,1n8g-megatron}.yaml`, `.../grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml`, `.../grpo-moonlight-16ba3b-4n8g-megatron.yaml`, `.../grpo-qwen2.5-{32b-32n8g-fsdp2tp8sp-actckpt{.v3,.long.v3},7b-instruct-4n8g-{fsdp2tp4sp.v3,megatron}}.yaml`, `.../grpo-qwen3-30ba3b-8n8g-megatron.yaml`, `.../grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml`	Shift to defaults-driven configs; remove rollout/validation/baselines/async and many optimizer/scheduler/generation/vLLM fields; add/set dtensor `_v2` where applicable; normalize numeric/boolean formatting; restructure schedulers; prune logger/cluster/env blocks.
LLM SFT recipes normalization `examples/configs/recipes/llm/sft-llama3.1-70b-8n8g-tp4pp2-long-megatron.yaml`, `.../sft-llama3.1-8b-1n8g-{fsdp2tp1-dynamicbatch,fsdp2tp1-long,fsdp2tp2sp}.yaml`, `.../sft-llama3.1-8b-1n8g-megatron{,-seqpack}.yaml`, `.../sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml`, `.../sft-qwen2.5-32b-4n8g-fsdp2tp8sp-actckpt.v3.yaml`	Add tokenizer chat_template; remove epochs/val flags and dtensor/dynamic/sequence packing details; simplify optimizer (lr/eps) and data/logging; unquote values; prune gpu monitoring and num_nodes.
VLM GRPO recipes normalization `examples/configs/recipes/vlm/vlm_grpo-qwen2.5-vl-3b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml`, `.../vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml`	Point to shared VLM defaults; collapse policy to essentials; simplify checkpointing; trim training/generation/data/env/logger/cluster blocks with minimal retained fields.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev
  participant CLI as config_cli.py
  participant FS as Filesystem
  participant YAML as YAML Loader

  Dev->>CLI: minimize --base BASE.yaml --child CHILD.yaml [-i]
  CLI->>YAML: load(BASE.yaml)
  CLI->>YAML: load(CHILD.yaml)
  CLI->>CLI: resolve defaults (expand)
  CLI->>CLI: compute minimized(CHILD - BASE)
  alt in-place (-i)
    CLI->>FS: write CHILD.yaml (defaults->relative, ordered)
  else stdout
    CLI-->>Dev: print minimized YAML
  end

sequenceDiagram
  autonumber
  actor Git as pre-commit
  participant Hook as minimize-check-*(bash)
  participant CLI as config_cli.py
  participant FS as Repo

  Git->>Hook: on commit
  loop for each (BASE, RECIPE)
    Hook->>CLI: minimize-check --base BASE --child RECIPE
    CLI->>FS: read files
    CLI->>CLI: compute minimized form
    alt differs
      CLI-->>Hook: exit 1 (non-zero)
      Hook-->>Git: fail hook
    else identical
      CLI-->>Hook: exit 0
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

CI:L1

Suggested reviewers

yuki-97

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	❓ Inconclusive	This PR is a major change (new CLI plus large-scale config refactors) that can affect numerics and performance. The description notes testing on a specific commit and documents DTensor v2 performance regressions for certain recipes, with mitigation (staying on v1) and mentions that screenshots are included. However, the PR description as provided here does not contain explicit before/after metrics, convergence evidence, or the exact context/config/hardware used, so I cannot verify that performance and numerics were adequately validated across impacted recipes. Given the absence of concrete numbers and artifacts in the text available, the check outcome is inconclusive.	Please augment the PR description with a concise Testing and Performance section: include before/after numbers (e.g., tokens/s, step time, reward/accuracy) for the affected recipes, the exact configs and commands used, hardware/software context (GPU model, nodes, CUDA/driver), and brief convergence evidence (loss/reward curves or summary stats) showing no regressions post-refactor. For the new CLI, add minimal test artifacts (e.g., sample inputs/outputs or unit test summary) demonstrating expand/minimize/minimize-check/compare behavior. Once these details are added, this check can be reassessed and should pass.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly states the primary feature addition (config_cli.py) and the accompanying updates to configuration files and pre-commit hooks, succinctly capturing the main purpose of the PR in a single sentence without unnecessary detail. It uses a conventional commit style prefix and avoids file lists or vague terms. Scanning the title gives a reviewer immediate insight into the key changes.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/config-fold

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

terrykong · 2025-09-24T16:36:38Z

tested on b171629

Signed-off-by: Terry Kong <terryk@nvidia.com> compare command Signed-off-by: Terry Kong <terryk@nvidia.com> config changes Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "config changes" This reverts commit 25b87e2. Signed-off-by: Terry Kong <terryk@nvidia.com> cleanup Signed-off-by: Terry Kong <terryk@nvidia.com> vlm example Signed-off-by: Terry Kong <terryk@nvidia.com> minimize configs Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "minimize configs" This reverts commit 1375480. Signed-off-by: Terry Kong <terryk@nvidia.com> minimize configs Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "minimize configs" This reverts commit a4cd8a4. Signed-off-by: Terry Kong <terryk@nvidia.com> minimize configs Signed-off-by: Terry Kong <terryk@nvidia.com> force sft configs to use default chat template to match last releases behavior Signed-off-by: Terry Kong <terryk@nvidia.com> reverting select configs to v1 to address Signed-off-by: Terry Kong <terryk@nvidia.com> add pre-commit and add a minimize-check func Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "reverting select configs to v1 to address" This reverts commit d81f806. Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "force sft configs to use default chat template to match last releases" This reverts commit be01df7. Signed-off-by: Terry Kong <terryk@nvidia.com> Revert "minimize configs" This reverts commit e54f144. Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>

.pre-commit-config.yaml

parthchadha · 2025-09-24T21:50:56Z

@terrykong thanks for this change, do you want to add a few unit tests for the new tool?

terrykong · 2025-09-24T21:51:08Z

Some of our configs in example/configs also depend on each other. E.g. grpo_math_8B.yaml uses the defaults from grpo_math_1B.yaml. Do we also want to change these so we only keep the minimal config diffs?

Yea, that's a good point. I created an issue for those: #1201. I only planned on minimizing the recipes since those have tests

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong · 2025-09-25T06:51:52Z

@parthchadha added tests in 044385c

…DIA-NeMo#1024) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>

…DIA-NeMo#1024) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

terrykong force-pushed the tk/config-fold branch 4 times, most recently from 4534008 to 1c0290c Compare September 6, 2025 06:41

github-actions bot added the CI Relating to CI label Sep 6, 2025

terrykong force-pushed the tk/config-fold branch 2 times, most recently from 260f022 to 1cd4fd9 Compare September 10, 2025 06:30

github-actions bot removed the CI Relating to CI label Sep 10, 2025

terrykong changed the title ~~feat: add config_tools.py and refactor configs~~ feat: add config_cli.py and refactor configs Sep 16, 2025

yuki-97 mentioned this pull request Sep 23, 2025

chore: remove some default value #1179

Closed

terrykong force-pushed the tk/config-fold branch 2 times, most recently from 04cbee1 to 968afbd Compare September 24, 2025 16:39

terrykong added 7 commits September 24, 2025 18:10

commit

3b06882

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix docstring to reflect it is a script

18b0e49

Signed-off-by: Terry Kong <terryk@nvidia.com>

minimize configs

029c68f

Signed-off-by: Terry Kong <terryk@nvidia.com>

sft default chat template

6d52f03

Signed-off-by: Terry Kong <terryk@nvidia.com>

v2 select configs

79d221b

Signed-off-by: Terry Kong <terryk@nvidia.com>

copyright

a844f62

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong force-pushed the tk/config-fold branch from 62be5ea to a844f62 Compare September 24, 2025 18:10

terrykong requested a review from yfw September 24, 2025 18:11

terrykong assigned parthchadha and unassigned parthchadha Sep 24, 2025

terrykong requested a review from parthchadha September 24, 2025 18:11

terrykong marked this pull request as ready for review September 24, 2025 18:11

terrykong requested review from a team as code owners September 24, 2025 18:11

Update tools/config_cli.py

a236134

Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>

terrykong commented Sep 24, 2025

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

terrykong mentioned this pull request Sep 24, 2025

Chore: More config cleanup #1201

Open

Update .pre-commit-config.yaml

8b59d23

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw previously approved these changes Sep 24, 2025

View reviewed changes

unit tests

044385c

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong dismissed yfw’s stale review via 044385c September 25, 2025 06:50

terrykong requested a review from a team as a code owner September 25, 2025 06:50

copyright

20ee149

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Sep 25, 2025

terrykong temporarily deployed to nemo-ci September 25, 2025 06:52 — with GitHub Actions Inactive

terrykong enabled auto-merge (squash) September 25, 2025 06:52

terrykong temporarily deployed to nemo-ci September 25, 2025 06:52 — with GitHub Actions Inactive

terrykong mentioned this pull request Sep 25, 2025

feat: add on policy distillation algorithm #1006

Merged

4 tasks

terrykong temporarily deployed to nemo-ci September 25, 2025 08:51 — with GitHub Actions Inactive

parthchadha approved these changes Sep 25, 2025

View reviewed changes

terrykong merged commit 32faafa into main Sep 25, 2025
42 checks passed

terrykong deleted the tk/config-fold branch September 25, 2025 16:33

coderabbitai bot mentioned this pull request Oct 16, 2025

cp: test: Update on-policy distillation release tests (1363) into r0.4.0 #1376

Merged

coderabbitai bot mentioned this pull request Nov 1, 2025

cp: use pydantic for yaml test validation (#1382) into r0.4.0 #1459

Merged

4 tasks

This was referenced Dec 19, 2025

test: Perf recipe for v0.5 #1661

Closed

test: Perf recipe for v0.5 #1667

Merged

coderabbitai bot mentioned this pull request Jan 16, 2026

ci: Add nightly and release tests for gb200 #1788

Merged

4 tasks

coderabbitai bot mentioned this pull request Feb 20, 2026

fix: Fix adv estimator configs #1994

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add config_cli.py and refactor configs + config pre-commit#1024

feat: add config_cli.py and refactor configs + config pre-commit#1024
terrykong merged 15 commits intomainfrom
tk/config-fold

terrykong commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

terrykong commented Sep 24, 2025

Uh oh!

Uh oh!

parthchadha commented Sep 24, 2025

Uh oh!

terrykong commented Sep 24, 2025

Uh oh!

terrykong commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

terrykong commented Aug 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Convergence/Perf check

Overview

Notable issues that arose from merging configs:

1. all dtensor recipes defaulted to v2 and that uncovered some perf regressions

grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long

2. SFT recipes defaulted to the default chat template recipe

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

terrykong commented Sep 24, 2025

Uh oh!

Uh oh!

parthchadha commented Sep 24, 2025

Uh oh!

terrykong commented Sep 24, 2025

Uh oh!

terrykong commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

terrykong commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

`grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long`

coderabbitai bot commented Sep 10, 2025 •

edited

Loading