feat: GRPO example for Qwen3 32b context length=128k #957

soodoshll · 2025-08-20T20:41:45Z

What does this PR do ?

Address #883.

Qwen3 32b + 128k context is runnable using TP8+CP4+actckpt on 4x8H100 nodes. Example run: https://wandb.ai/nvidia/nemo-rl/runs/5qhgqcqf?nw=nwuserqidongs
Roll-out is extremely slow in this case so I use a rather low batch size in this recipe.
I use YaRN to extend the context length to 128k in vllm. Not 100% sure if megatron is correctly handling this part.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

wangshangsam · 2025-08-20T21:16:53Z

Are there any theoratical differences between this and doing TP8+CP4+actckpt on DTensor, in which case DTensor couldn't work?

soodoshll · 2025-08-21T23:47:53Z

Are there any theoratical differences between this and doing TP8+CP4+actckpt on DTensor, in which case DTensor couldn't work?

need a profiling to see why. btw, dtensor can run with seqlen=64k

wangshangsam

Actually, by exsisting convention, the hardware setup should come before the runtime config, and ideally the max supported context length should come with the model & task, e.g., grpo-math-qwen3-32b-128k-4n8g-megatrontp8cp4.yaml.

Could you also rename the other grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml into grpo-math-qwen3-30ba3b-32k-4n8g-megatrontp4ep8.yaml? We forgot about this aspect when we reviewed #918 last week (cc @pjin-nvidia)

wangshangsam · 2025-08-22T21:20:37Z

Are there any theoratical differences between this and doing TP8+CP4+actckpt on DTensor, in which case DTensor couldn't work?

need a profiling to see why. btw, dtensor can run with seqlen=64k

Let's get this PR merged first. The key objective of the issue this PR is addressing is to unblock the Nemotron folks, so if MCore path already works, let's leave it at that. We can dig into the memory profile for very long seq len when addressing #885.

wangshangsam · 2025-08-22T22:15:31Z

Btw, since you are adding a new recipe, you need to add a bash script that tests this new recipe in the CI too (there's a unit test that check if you have done that or not). Refers to #926 tests/test_suites/llm and tests/test_suites/nightly for an example.

soodoshll · 2025-08-24T03:03:44Z

Actually, by exsisting convention, the hardware setup should come before the runtime config, and ideally the max supported context length should come with the model & task, e.g., grpo-math-qwen3-32b-128k-4n8g-megatrontp8cp4.yaml.

Could you also rename the other grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml into grpo-math-qwen3-30ba3b-32k-4n8g-megatrontp4ep8.yaml? We forgot about this aspect when we reviewed #918 last week (cc @pjin-nvidia)

fixed. but overall the recipe naming still lacks consistency. It'd be better if we can have a naming convention.

wangshangsam · 2025-08-25T19:05:59Z

Actually, by exsisting convention, the hardware setup should come before the runtime config, and ideally the max supported context length should come with the model & task, e.g., grpo-math-qwen3-32b-128k-4n8g-megatrontp8cp4.yaml.
Could you also rename the other grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml into grpo-math-qwen3-30ba3b-32k-4n8g-megatrontp4ep8.yaml? We forgot about this aspect when we reviewed #918 last week (cc @pjin-nvidia)

fixed. but overall the recipe naming still lacks consistency. It'd be better if we can have a naming convention.

Naming convention: https://github.com/NVIDIA-NeMo/RL/tree/main/tests/test_suites#naming

Signed-off-by: Qidong Su <[email protected]>

terrykong · 2025-09-02T18:27:58Z

tests/test_suites/llm/grpo-math-qwen3-32b-128k-4n8g-megatrontp8cp4.sh

+# Only run metrics if the target step is reached
+if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
+    uv run tests/check_metrics.py $JSON_METRICS \
+        'mean(data["train/token_mult_prob_error"]) < 1.1' \


wouldn't this fail according to the wandb link you shared?

I think with longer generations, we'll probably run into outliers that skew the mean. This one is run for so few steps, it's probably hard to write something robust. Maybe:

Suggested change

'mean(data["train/token_mult_prob_error"]) < 1.1' \

'min(data["train/token_mult_prob_error"]) < 1.1' \

and then add a comment above why you use min for this particular test

I made an issue to track this: #1039

terrykong · 2025-09-02T18:29:37Z

examples/configs/recipes/llm/grpo-math-qwen3-32b-128k-4n8g-megatrontp8cp4.yaml

+    rope_scaling:
+      type: "yarn"
+      factor: 4.0
+      original_max_position_embeddings: 32768


which models need this? is it possible to handle this in code?

In the past when we had stuff like this, the consensus was to handle it in code since we knew which model types needed it, ex: fdb565c

regardless of if this is handled in code or yaml, it should probably have an entry in the model-quirks.md so we have documentation

It is used by qwen3 + long context length: https://huggingface.co/Qwen/Qwen3-32B#processing-long-texts.

Since it's a optional configuration that can be changed by the user, I tend to explicitly put it in yaml.

Will update model-quirks.md to reflect this.

Signed-off-by: Qidong Su <[email protected]>

soodoshll · 2025-09-04T18:05:09Z

This PR is not merged because it depends on the support of YaRN in the Megatron backend to be ready.
Related MR in Megatron-LM: #3854

guyueh1 · 2025-10-01T17:20:12Z

nemo_rl/models/policy/megatron_policy_worker.py

        self.dp_size = worker_sharding_annotations.get_axis_size("data_parallel")
        self.megatron_bridge = AutoBridge.from_hf_pretrained(
-            hf_model_name, trust_remote_code=True
+            hf_model_name, trust_remote_code=True, **self.cfg.get("model_kwargs", {})


I think mcore path cannot parse/handle the rope_scaling.type="yarn" even if you pass these arguments, this might succeed but your model arch is still rope, this seems confusing.

maybe we should error out in mcore path if seeing rope_scaling.type=="yarn" until necessary support is added;

@yaoyu-33 to confirm if the qwen and llama bridge in mbridge can parse and handle this field. I believe this is only supported in deepseek model type.

soodoshll force-pushed the qwen-128k branch from c07cbf0 to 91dd514 Compare August 20, 2025 20:42

soodoshll requested a review from wangshangsam August 20, 2025 20:43

soodoshll force-pushed the qwen-128k branch from 91dd514 to c5b07ba Compare August 20, 2025 20:44

wangshangsam assigned soodoshll Aug 20, 2025

soodoshll force-pushed the qwen-128k branch from de9c6c4 to 22e6b08 Compare August 21, 2025 23:07

wangshangsam requested changes Aug 22, 2025

View reviewed changes

parthchadha mentioned this pull request Aug 28, 2025

fix: convergence issue by adding use_inductor=False in vllm compilation_config #1014

Merged

4 tasks

soodoshll force-pushed the qwen-128k branch from b98c04b to 74c3c46 Compare August 28, 2025 17:57

github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Aug 28, 2025

soodoshll force-pushed the qwen-128k branch from 74c3c46 to 97ecf46 Compare August 28, 2025 18:03

github-actions bot removed documentation Improvements or additions to documentation CI Relating to CI labels Aug 28, 2025

soodoshll added 10 commits August 28, 2025 11:07

add rope config

03c03fd

Signed-off-by: Qidong Su <[email protected]>

upd

d9c70fc

Signed-off-by: Qidong Su <[email protected]>

upd

98ec95a

Signed-off-by: Qidong Su <[email protected]>

fix

774f87a

Signed-off-by: Qidong Su <[email protected]>

upd

07ef653

Signed-off-by: Qidong Su <[email protected]>

rename

cd9435b

Signed-off-by: Qidong Su <[email protected]>

upd

5d36fcb

Signed-off-by: Qidong Su <[email protected]>

upd

9ddf53a

Signed-off-by: Qidong Su <[email protected]>

upd

52e7c15

Signed-off-by: Qidong Su <[email protected]>

fix

cc65592

Signed-off-by: Qidong Su <[email protected]>

soodoshll force-pushed the qwen-128k branch from 0dc10b5 to cc65592 Compare August 28, 2025 18:07

soodoshll requested a review from wangshangsam August 28, 2025 21:06

wangshangsam previously approved these changes Aug 28, 2025

View reviewed changes

wangshangsam added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 28, 2025

wangshangsam temporarily deployed to nemo-ci August 28, 2025 21:26 — with GitHub Actions Inactive

wangshangsam temporarily deployed to nemo-ci August 28, 2025 22:09 — with GitHub Actions Inactive

reduce test time

77222af

Signed-off-by: Qidong Su <[email protected]>

soodoshll dismissed wangshangsam’s stale review via 77222af August 30, 2025 02:13

wangshangsam added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 30, 2025

wangshangsam temporarily deployed to nemo-ci August 30, 2025 04:23 — with GitHub Actions Inactive

wangshangsam temporarily deployed to nemo-ci August 30, 2025 04:31 — with GitHub Actions Inactive

Merge remote-tracking branch 'upstream' into qwen-128k

7d77ed3

soodoshll requested a review from wangshangsam September 2, 2025 17:10

wangshangsam added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Sep 2, 2025

wangshangsam previously approved these changes Sep 2, 2025

View reviewed changes

wangshangsam temporarily deployed to nemo-ci September 2, 2025 17:12 — with GitHub Actions Inactive

wangshangsam temporarily deployed to nemo-ci September 2, 2025 18:15 — with GitHub Actions Inactive

terrykong reviewed Sep 2, 2025

View reviewed changes

wangshangsam mentioned this pull request Sep 2, 2025

Tighten the NUM_MINUTES in the bash scripts for nightly & release test suites. #1041

Closed

soodoshll added 2 commits September 3, 2025 10:07

Merge branch 'main' of github.com:NVIDIA-NeMo/RL into qwen-128k

cff33e1

upd

f2fc000

Signed-off-by: Qidong Su <[email protected]>

soodoshll dismissed wangshangsam’s stale review via f2fc000 September 3, 2025 19:10

doc

02087ea

Signed-off-by: Qidong Su <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Sep 3, 2025

wangshangsam linked an issue Sep 4, 2025 that may be closed by this pull request

Enable GRPO Qwen 3 32B with 128k context length #883

Open

guyueh1 reviewed Oct 1, 2025

View reviewed changes

snowmanwwg mentioned this pull request Dec 21, 2025

Can we apply yarn to Nemo-RL #1678

Open

	'mean(data["train/token_mult_prob_error"]) < 1.1' \
	'min(data["train/token_mult_prob_error"]) < 1.1' \

feat: GRPO example for Qwen3 32b context length=128k #957

Are you sure you want to change the base?

feat: GRPO example for Qwen3 32b context length=128k #957

Uh oh!

Conversation

soodoshll commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

wangshangsam commented Aug 20, 2025

Uh oh!

soodoshll commented Aug 21, 2025

Uh oh!

wangshangsam left a comment

Choose a reason for hiding this comment

Uh oh!

wangshangsam commented Aug 22, 2025

Uh oh!

wangshangsam commented Aug 22, 2025

Uh oh!

soodoshll commented Aug 24, 2025

Uh oh!

wangshangsam commented Aug 25, 2025

Uh oh!

terrykong Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

terrykong Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

soodoshll Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

soodoshll commented Sep 4, 2025

Uh oh!

guyueh1 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

guyueh1 Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

soodoshll commented Aug 20, 2025 •

edited

Loading