feat: Enable simulated user for multi-turn GRPO by jialei777 · Pull Request #606 · NVIDIA-NeMo/RL

jialei777 · 2025-07-03T18:59:49Z

What does this PR do ?

Add an simple example on multi-turn GRPO using ADK.

Issues

List issues that this PR closes (syntax):

Usage

uv run python examples/run_grpo_unique_numbers_w_adk.py

Training reward:

Validation acc:

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

terrykong · 2025-07-16T05:31:57Z

Hi @jialei777 . This is a really cool example! This flew under the radar, but did you want to contribute this back into NeMo RL?

Your example is very succinct and it would probably be a great starting example for other folks interested in multi-turn. Let us know if you're interested

jialei777 · 2025-07-16T19:01:42Z

@terrykong definitely interested in contributing. The current issue is the convocation format is incorrect during "policy_generation.generate()". I guess you guys are more familiar with that? Would you help with that? What is the best way to discuss?

terrykong · 2025-07-16T19:28:19Z

Awesome

@jialei777 I created an issue where we can discuss so we don't crowd your PR. Feel free to share what issues you seem to be running into

jialei777 · 2025-07-16T19:34:48Z

@terrykong is that possible to have a meeting to discuss in detail? May be easier.

README_Jialei_runs.md

jialei777 · 2025-07-22T22:08:44Z

hey @terrykong, I think the PR is mostly ready. But unfortunately i did not sign-off my initial commits, does that really matter for squash merge?

terrykong

Thanks @jialei777 for the contribution. Could you share an example run command and the train/reward curves for this in the PR description so others can try your example?

re: DCO

unfortunately the DCO check can't run on the main branch, do you mind resolving on your branch?

The DCO job gives these instructinos:

Ensure you have a local copy of your branch by checking out the pull request locally via command line.
In your local branch, run: git rebase HEAD~9 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin jialei/simulated-user

Could you also add some tests to the code you added? You made some of the scripts executable, but those checks would be great to have in unit tests. Also for the environment, can you check out

RL/tests/unit/environments/test_math_environment.py

Line 173 in a607b41

def test_math_env_step_basic(math_env, basic_test_data):

and write similar tests

nemo_rl/experience/rollouts.py

terrykong · 2025-07-23T18:08:13Z

nemo_rl/experience/rollouts.py

-            ).input_ids[0]
-
+            # Tokenize the raw content from the environment into chat format if needed
+            env_role = env_output.observations[i]["role"].lower()


@SahilJain314 can you review this logic?

also, I think whatever change happens here should be copied in run_multi_turn_rollout_async so they have feature parity

already updated in run_sample_multi_turn_rollout, which is used in run_multi_turn_rollout_async. Please let me know if the logic is correct.

terrykong · 2025-07-23T18:13:57Z

pyproject.toml

    "matplotlib",
    "plotly",
-    "mlflow",
+    "google-adk",


Out of curiosity, how much larger does the google-adk make our images (when you include all the dependencies)?

Note that any change in pyproject.toml requires you to run uv lock to update the dependency lock file (that lock file is like a requirements.txt that pins the last known working state). A quick way to check is just building the docker image before and after and checking total image || layer size to see the delta.

At a glance, the direct dependencies of google adk are numerous so I think it would be better to add it as an "extra" under optional-dependencies like https://github.com/NVIDIA-NeMo/RL/blob/main/pyproject.toml#L51

and then adding a PY_EXECUTABLE for it and reference it in the environment:

RL/nemo_rl/distributed/virtual_cluster.py

Line 49 in a607b41

VLLM = "uv run --locked --extra vllm"

The benefit of slimming down our core dependencies when possible makes it easier/faster for certain CI jobs to run.

do i also need to checkin the uv lock file?

jialei777 · 2025-07-23T19:48:57Z

@terrykong thank you for your review. I will try to address those issue shortly.

I ran into a new issue after sync with main branch. Seems now "num_valid_samples" is bigger than batch_size (512). is that as expected, or it is a bug included recently? Here is a comparison of jobs before (red) and after (green) merge main

Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

) Signed-off-by: Xuehan Xiong <xxman@google.com> Signed-off-by: Xuehan <xxman@google.com> Signed-off-by: Xuehan Xiong xxman@google.com Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com> Signed-off-by: Luis Vega <vegaluisjose@users.noreply.github.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Dheeraj Peri <peri.dheeraj@gmail.com> Signed-off-by: Sahil Jain <sahilj@nvidia.com> Signed-off-by: Yuki Huang <scsse_hqh2016@126.com> Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Atsunori Fujita <afujita@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Zhaocheng Zhu <zhaochengz@nvidia.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com> Co-authored-by: Luis Vega <vegaluisjose@users.noreply.github.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Dheeraj Peri <peri.dheeraj@gmail.com> Co-authored-by: Yuki Huang <scsse_hqh2016@126.com> Co-authored-by: Shun Kiyono <shunk52@gmail.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: atfujita <40932835+AtsunoriFujita@users.noreply.github.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…o#505) Signed-off-by: abukharin-nv <abukharin@nvidia.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Jonas yang <joyang@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…A-NeMo#589) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…o#622) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Sahil Jain <sahilj@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…ce (NVIDIA-NeMo#634) Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…Mo#638) Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Xuehan <xxman@google.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…VIDIA-NeMo#681) Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…VIDIA-NeMo#693) Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…DIA-NeMo#689) Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

…tensor (NVIDIA-NeMo#704) Signed-off-by: Sahil Jain <sahilj@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Jonas yang <joyang@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Jonas Yang CN <joyang@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Co-authored-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Jialei Chen <jialeic@google.com>

…Mo#695) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Jialei Chen <jialeic@google.com>

jialei777 · 2025-07-23T22:49:08Z

Closing this PR and creating #732 since I messed up history with rebasing...

jialei777 closed this Jul 3, 2025

terrykong reopened this Jul 16, 2025

terrykong mentioned this pull request Jul 16, 2025

Issues encountered in custom multi-turn example #682

Open

yuanxue reviewed Jul 16, 2025

View reviewed changes

README_Jialei_runs.md Outdated Show resolved Hide resolved

jialei777 force-pushed the jialei/simulated-user branch from 7e0a66e to 317eb60 Compare July 17, 2025 00:22

snowmanwwg added the external label Jul 20, 2025

jialei777 mentioned this pull request Jul 21, 2025

Support ADK for RL training #641

Closed

jialei777 changed the title ~~Enable simulated user for multi-turn GRPO~~ feat: Enable simulated user for multi-turn GRPO Jul 22, 2025

jialei777 force-pushed the jialei/simulated-user branch from e83cfbd to 1c96d74 Compare July 23, 2025 04:19

jialei777 marked this pull request as ready for review July 23, 2025 04:50

terrykong reviewed Jul 23, 2025

View reviewed changes

terrykong requested a review from SahilJain314 July 23, 2025 18:17

xxman-google and others added 12 commits July 23, 2025 21:02

ini

a6aacdb

Signed-off-by: Jialei Chen <jialeic@google.com>

fix: load HF model only on rank 0 (NVIDIA-NeMo#544)

eba5e76

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: enable expandable segments for hopper+ (NVIDIA-NeMo#594)

9325915

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: skip HelpSteer3 unit test if downloading failed (NVIDIA-NeMo#612)

e694d38

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

feat: optimize get logprobs when cp enabled. (NVIDIA-NeMo#528)

4e9bd60

Signed-off-by: Jonas yang <joyang@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

enable mcore rope fusion (NVIDIA-NeMo#608)

6d45949

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: fix non-colocated with vllm tp>1 (NVIDIA-NeMo#601)

d1bd4a5

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

feat: add flash-attn==2.7.4.post1 to backend dependencies (NVIDIA-NeM…

01cbc28

…o#622) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Sahil Jain <sahilj@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

ZhiyuLi-Nvidia and others added 20 commits July 23, 2025 21:02

fix: adjust temperature scaling logic based on engine version (NVIDIA…

d0424cf

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

feat: optimize refit by preparing refit info ahead of time (NVIDIA-Ne…

00d74b7

…Mo#638) Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

docs: update converter path in README. (NVIDIA-NeMo#672)

6b97100

Signed-off-by: Xuehan <xxman@google.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: make mcore lr scheduler configuration consistent with dtensor (N…

c8115f9

…VIDIA-NeMo#681) Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: fix mcore LR increment (NVIDIA-NeMo#685)

88a9429

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: upgrade datasets to fix squad download (NVIDIA-NeMo#692)

a0df2ef

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: Megatron config updates to avoid OOM (NVIDIA-NeMo#687)

22a984a

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: fix lr scheduler for config that was missed in NVIDIA-NeMo#681 (N…

00e33a9

…VIDIA-NeMo#693) Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: Fix gemma models broken by HF update (NVIDIA-NeMo#676)

c4b5151

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

chore: add CP+SP (sequence parallel) assertion in DTensor worker (NVI…

6a22c8a

…DIA-NeMo#689) Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

feat: MLFlow Integration for experiment tracking (NVIDIA-NeMo#697)

75a5a6d

Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>

fix: Fix activation checkpointing for mcore path (NVIDIA-NeMo#703)

022647b

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

fix: SyntaxWarning: invalid escape sequence '\s' (NVIDIA-NeMo#705)

71ed6e7

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

cleanup and add docstring

6c1898a

Signed-off-by: Jialei Chen <jialeic@google.com>

remove image

af6fa8d

Signed-off-by: Jialei Chen <jialeic@google.com>

ci: add a job that checks if submodules are fast forwarded (NVIDIA-Ne…

084235d

…Mo#695) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

add uv.lock file

1b972ce

Signed-off-by: Jialei Chen <jialeic@google.com>

jialei777 force-pushed the jialei/simulated-user branch from 3af1a14 to 1b972ce Compare July 23, 2025 21:04

github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Jul 23, 2025

update uv lock

6d9460b

Signed-off-by: Jialei Chen <jialeic@google.com>

github-actions bot removed the documentation Improvements or additions to documentation label Jul 23, 2025

Merge branch 'main' into jialei/simulated-user

e6890cc

github-actions bot removed the CI Relating to CI label Jul 23, 2025

jialei777 mentioned this pull request Jul 23, 2025

feat: Enable simulated user for multi-turn GRPO [new] #732

Closed

4 tasks

jialei777 closed this Jul 23, 2025

Conversation

jialei777 commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

terrykong commented Jul 16, 2025

Uh oh!

jialei777 commented Jul 16, 2025

Uh oh!

terrykong commented Jul 16, 2025

Uh oh!

jialei777 commented Jul 16, 2025

Uh oh!

Uh oh!

jialei777 commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

terrykong Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jialei777 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

terrykong Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jialei777 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jialei777 commented Jul 23, 2025

Uh oh!

jialei777 commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

jialei777 commented Jul 3, 2025 •

edited

Loading

jialei777 commented Jul 22, 2025 •

edited

Loading