Skip to content

cp: feat: default packed_sequence to True in all finetune recipes (2284) into r0.3.0#2332

Closed
ko3n1g wants to merge 1 commit intor0.3.0from
cherry-pick-2284-r0.3.0
Closed

cp: feat: default packed_sequence to True in all finetune recipes (2284) into r0.3.0#2332
ko3n1g wants to merge 1 commit intor0.3.0from
cherry-pick-2284-r0.3.0

Conversation

@ko3n1g
Copy link
Contributor

@ko3n1g ko3n1g commented Feb 11, 2026

beep boop [🤖]: Hi @yaoyu-33 👋,

we've cherry picked #2284 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Release Notes

  • Configuration Changes
    • Sequence packing is now enabled by default during finetuning for most model architectures (Gemma, Qwen, Llama, GPT-OSS, Nemotron, Moonlight, and OLMoE), improving training efficiency and memory utilization.
    • GLM 4.5 continues to use sequence packing disabled by default; users can override this setting per their requirements.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ko3n1g
Copy link
Contributor Author

ko3n1g commented Feb 11, 2026

/ok to test dca08ae

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

This PR updates the default value of the packed_sequence parameter from False to True across multiple model recipe finetune helper functions and the default_squad_config utility. The GLM4.5 recipe adds an assertion to enforce packed_sequence as False despite the signature change. One test file adds a mock attribute.

Changes

Cohort / File(s) Summary
Gemma and Gemma3 Recipes
src/megatron/bridge/recipes/gemma/gemma2.py, src/megatron/bridge/recipes/gemma/gemma3.py
Updated packed_sequence parameter default from False to True in finetune helper functions.
GLM4.5 Recipe
src/megatron/bridge/recipes/glm/glm45.py
Added packed_sequence: bool to GLM45FinetuneKwargs, changed default in _glm45_finetune_common to True, and added runtime assertion (assert not packed_sequence) to enforce False behavior. Finetune config helpers explicitly set packed_sequence to False.
GPT-OSS Recipe
src/megatron/bridge/recipes/gpt_oss/gpt_oss.py
Updated packed_sequence parameter default from False to True in _gpt_oss_finetune_common.
Llama Recipe
src/megatron/bridge/recipes/llama/llama3.py
Updated packed_sequence parameter default from False to True in _llama3_finetune_common.
Moonlight Recipe
src/megatron/bridge/recipes/moonlight/moonlight_16b.py
Updated packed_sequence parameter default from False to True in _moonlight_finetune_common.
Nemotron Recipes
src/megatron/bridge/recipes/nemotronh/nemotron_3_nano.py, src/megatron/bridge/recipes/nemotronh/nemotronh.py
Updated packed_sequence parameter default from False to True in finetune helper functions.
OLMoE Recipe
src/megatron/bridge/recipes/olmoe/olmoe_7b.py
Updated packed_sequence parameter default from False to True in _olmoe_finetune_common.
Qwen Recipes
src/megatron/bridge/recipes/qwen/qwen2.py, src/megatron/bridge/recipes/qwen/qwen3.py, src/megatron/bridge/recipes/qwen/qwen3_moe.py, src/megatron/bridge/recipes/qwen/qwen3_next.py
Updated packed_sequence parameter default from False to True in finetune helper functions. Qwen3-Next also explicitly sets packed_sequence: False in recommended config with documentation note.
Finetune Utilities
src/megatron/bridge/recipes/utils/finetune_utils.py
Updated default_squad_config function signature to default packed_sequence parameter to True instead of False.
Test Updates
tests/unit_tests/recipes/test_qwen_recipes.py
Added context_parallel_size = 1 attribute to mock model configuration used by Qwen test suite.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

r0.3.0

Suggested reviewers

  • yaoyu-33
  • cuichenx
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR changes default packed_sequence from False to True across 13+ recipe files affecting training behavior, but lacks test results, convergence validation, or performance metrics. Add test results, convergence validation, and performance metrics to PR documentation demonstrating no regressions across affected models.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: defaulting packed_sequence to True across all finetune recipes. It is specific, concise, and directly reflects the primary purpose of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-2284-r0.3.0

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
src/megatron/bridge/recipes/glm/glm45.py (1)

417-417: Default packed_sequence=True immediately triggers the assertion on line 446.

The signature defaults to True, but line 446 asserts not packed_sequence. Any direct call to _glm45_finetune_common() without explicitly passing packed_sequence=False will raise AssertionError. While the public configs both override this to False, the contradictory default is confusing and fragile.

Consider defaulting to False here (unlike the other recipes) since GLM 4.5 doesn't support packing, and preferably use raise ValueError(...) instead of assert for input validation that should not be stripped by -O.

Suggested fix
-    packed_sequence: bool = True,
+    packed_sequence: bool = False,

And on line 446:

-    assert not packed_sequence, "Packed sequence is not supported for GLM 4.5 finetuning"
+    if packed_sequence:
+        raise ValueError("Packed sequence is not supported for GLM 4.5 finetuning")

Also applies to: 446-446


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ko3n1g ko3n1g marked this pull request as draft February 11, 2026 18:22
@cuichenx cuichenx marked this pull request as draft February 17, 2026 18:55
@cuichenx
Copy link
Contributor

need thorough testing before merging

@ko3n1g
Copy link
Contributor Author

ko3n1g commented Feb 26, 2026

merged via the finetune PR

@ko3n1g ko3n1g closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants