feat: code execution & tool use by KiddoZhu · Pull Request #322 · NVIDIA-NeMo/RL

KiddoZhu · 2025-05-06T21:23:47Z

Sorry I messed up commit history when clearing secrets in some commits (#204). Here is a new clean branch.

@SahilJain314 How shall we decide gen_length if generated logprobs can be 0?

Supported features

On-the-fly multiple tool calls. It will continue generation until hitting an EOS token or user-specified stop strings.
Batch code execution based on ray.remote.
Stateful code executor. In the same generation, functions and variables will be passed to the next code snippet.
A simple sandbox for code execution. Not fully secure for malicious code, but better than nothing.

Examples (test_vllm_tools.py)

<code>x = 3; y = 4</code>
This is some regular text.
<code>x + y</code>
<result>7</result>

<code>retrieve('Jen-Hsun Huang')</code>

<result>
['Nvidia was established in 1993 by Jen-Hsun Huang, Curtis Priem, and Chris '
 'Malachowsky. In 2000 Nvidia took intellectual possession of 3dfx, one of the '
 'biggest GPU producers in 1990s.']
</result>

A simple sandbox

Filesystem: The code will be executed in a temporary directory. It can read or write any file in this directory. Access beyond this temporary directory is denied.
Modules: Block modules like os, sys, multiprocessing, subprocess, etc. that may modify the filesystem or reside programs in the memory.

Tokenizer issue

There may be side cases where tokens don't split exactly at the end of </code>. For example, the tokenizer of GPT-4o will generate the following

... </##code##>x

If we tweak >x to be >, it will change the log prob in RL. My current solution is to not touch any generated token and directly append results afterwards

... </code>x<result> ...

KiddoZhu · 2025-05-08T04:21:07Z

Just added environments for code execution and tool use. Do we need to keep the previous generate_with_code_and_tools implementation? My feeling is that a policy.generate-like interface is more convenient when users just want to do inference.

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

KiddoZhu · 2025-05-16T00:40:54Z

@SahilJain314 any comment or suggestion?

nemo_rl/tools/generation.py

nemo_rl/tools/interfaces.py

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

KiddoZhu · 2025-06-09T22:06:44Z

@SahilJain314 I've rolled back both unnecessary implementations you mentioned. Now it's ready to merge except for a mypy check. Do you know how to solve it?

KiddoZhu · 2025-06-10T23:54:01Z

Just rolled back the logic of generation length in HFPolicy. I opened #499 for future discussions on this issue.

nemo_rl/environments/code_environment.py

nemo_rl/environments/tools/retriever.py

nemo_rl/experience/rollouts.py

KiddoZhu · 2025-07-19T00:42:15Z

Hi @SahilJain314 I noticed HF path has been removed in test_rollout.py. How shall I adapt the HF path in tool use? Do we want to remove it as well?

SahilJain314 · 2025-07-23T16:06:22Z

I think it's safe to remove HF generation as a supported rollout path if that's what you're asking

KiddoZhu · 2025-07-23T20:52:39Z

Thanks! I removed them and all relative tests are passed. Should be ready to merge.

SahilJain314 · 2025-07-26T09:57:03Z

I'm happy to merge this as it stands, but I'd love to see an example (and convergence plots) with this soon (in follow up) to show users how to use it.

@KiddoZhu please also address the DCO.

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

KiddoZhu · 2025-07-28T23:38:23Z

DCO resolved.

I'm happy to merge this as it stands, but I'd love to see an example (and convergence plots) with this soon (in follow up) to show users how to use it.

Yes, I will add an example and a doc file once I use code execution in a more realistic setup. It'll be in a separate thread.

terrykong · 2025-07-28T23:51:38Z

@KiddoZhu a couple of other small things:

can you mark the tests w/ pytest.mark.hf_gated for ones that need the gated models (like llama)
- (see this PR for example): https://github.com/NVIDIA-NeMo/RL/pull/755/files
for things decorated with ray.remote, can you add

@ray.remote  # pragma: no cover

Signed-off-by: Terry Kong <terryk@nvidia.com>

KiddoZhu · 2025-07-29T23:54:59Z

Thanks Terry! I will do that for future commits.

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: tpoisonooo <khj.application@aliyun.com>

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>

shamanez · 2025-10-20T06:01:19Z

Any coding/tool use example, specially with a sandbox?

KiddoZhu requested review from SahilJain314 and terrykong May 8, 2025 04:10

KiddoZhu requested a review from parthchadha May 8, 2025 20:37

KiddoZhu changed the title ~~Code execution & tool use~~ feat: code execution & tool use May 8, 2025

KiddoZhu added 4 commits May 8, 2025 15:50

code execution + tool use + basic blockers for filesystems & modules

4bc5bfd

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

revert from main branch

09e7a80

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

rewrite code & tool use as environments

e882334

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

fix lint check

487cd94

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

KiddoZhu force-pushed the zhaochengz/tool-calling branch from a03ab6f to 487cd94 Compare May 8, 2025 22:51

snowmanwwg added the r0.3.0 Release r0.3.0 label May 15, 2025

SahilJain314 reviewed May 20, 2025

View reviewed changes

nemo_rl/tools/generation.py Outdated Show resolved Hide resolved

nemo_rl/tools/interfaces.py Show resolved Hide resolved

KiddoZhu added 2 commits June 9, 2025 11:38

Merge remote-tracking branch 'origin/main' into zhaochengz/tool-calling

7dfe468

clean up old impleementation & test passed

9563db2

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

KiddoZhu force-pushed the zhaochengz/tool-calling branch from 80ad777 to 7ab7350 Compare June 9, 2025 21:47

terrykong reviewed Jun 16, 2025

View reviewed changes

nemo_rl/environments/code_environment.py Outdated Show resolved Hide resolved

terrykong reviewed Jun 16, 2025

View reviewed changes

nemo_rl/environments/tools/retriever.py Show resolved Hide resolved

terrykong reviewed Jun 16, 2025

View reviewed changes

nemo_rl/experience/rollouts.py Outdated Show resolved Hide resolved

terrykong removed the r0.3.0 Release r0.3.0 label Jul 14, 2025

SahilJain314 added the CI:L0 Run doctests and unit tests label Jul 26, 2025

SahilJain314 temporarily deployed to nemo-ci July 26, 2025 01:08 — with GitHub Actions Inactive

SahilJain314 previously approved these changes Jul 26, 2025

View reviewed changes

KiddoZhu dismissed SahilJain314’s stale review via 59cd464 July 28, 2025 22:11

github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Jul 28, 2025

KiddoZhu force-pushed the zhaochengz/tool-calling branch from 59cd464 to c45dd66 Compare July 28, 2025 22:21

github-actions bot removed documentation Improvements or additions to documentation CI Relating to CI labels Jul 28, 2025

KiddoZhu added 2 commits July 28, 2025 16:12

resolve merge conflicts

26a680a

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

remove hf path

5bed075

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>

KiddoZhu force-pushed the zhaochengz/tool-calling branch from c45dd66 to 5bed075 Compare July 28, 2025 23:16

SahilJain314 previously approved these changes Jul 28, 2025

View reviewed changes

parthchadha previously approved these changes Jul 29, 2025

View reviewed changes

terry nits

013fb29

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong dismissed stale reviews from parthchadha and SahilJain314 via 013fb29 July 29, 2025 21:13

terrykong enabled auto-merge July 29, 2025 21:13

terrykong approved these changes Jul 29, 2025

View reviewed changes

terrykong added this pull request to the merge queue Jul 29, 2025

Merged via the queue into main with commit d256fb5 Jul 30, 2025
15 checks passed

terrykong deleted the zhaochengz/tool-calling branch July 30, 2025 00:00

FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025

feat: code execution & tool use (NVIDIA-NeMo#322)

c8dc822

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>

FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025

feat: code execution & tool use (NVIDIA-NeMo#322)

fc4cc29

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>

KiddoZhu mentioned this pull request Aug 6, 2025

Examples for code environment and/or tool use. #858

Open

euronymous-aithal assigned KiddoZhu Aug 8, 2025

Conversation

KiddoZhu commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Supported features

Examples (test_vllm_tools.py)

A simple sandbox

Tokenizer issue

Uh oh!

KiddoZhu commented May 8, 2025

Uh oh!

KiddoZhu commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

KiddoZhu commented Jun 9, 2025

Uh oh!

KiddoZhu commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KiddoZhu commented Jul 19, 2025

Uh oh!

SahilJain314 commented Jul 23, 2025

Uh oh!

KiddoZhu commented Jul 23, 2025

Uh oh!

SahilJain314 commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KiddoZhu commented Jul 28, 2025

Uh oh!

terrykong commented Jul 28, 2025

Uh oh!

KiddoZhu commented Jul 29, 2025

Uh oh!

Uh oh!

shamanez commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

KiddoZhu commented May 6, 2025 •

edited

Loading

SahilJain314 commented Jul 26, 2025 •

edited

Loading