Skip to content

feat: code execution & tool use#322

Merged
terrykong merged 9 commits intomainfrom
zhaochengz/tool-calling
Jul 30, 2025
Merged

feat: code execution & tool use#322
terrykong merged 9 commits intomainfrom
zhaochengz/tool-calling

Conversation

@KiddoZhu
Copy link
Contributor

@KiddoZhu KiddoZhu commented May 6, 2025

Sorry I messed up commit history when clearing secrets in some commits (#204). Here is a new clean branch.

@SahilJain314 How shall we decide gen_length if generated logprobs can be 0?


Supported features

  • On-the-fly multiple tool calls. It will continue generation until hitting an EOS token or user-specified stop strings.
  • Batch code execution based on ray.remote.
  • Stateful code executor. In the same generation, functions and variables will be passed to the next code snippet.
  • A simple sandbox for code execution. Not fully secure for malicious code, but better than nothing.

Examples (test_vllm_tools.py)

<code>x = 3; y = 4</code>
This is some regular text.
<code>x + y</code>
<result>7</result>
<code>retrieve('Jen-Hsun Huang')</code>

<result>
['Nvidia was established in 1993 by Jen-Hsun Huang, Curtis Priem, and Chris '
 'Malachowsky. In 2000 Nvidia took intellectual possession of 3dfx, one of the '
 'biggest GPU producers in 1990s.']
</result>

A simple sandbox

  • Filesystem: The code will be executed in a temporary directory. It can read or write any file in this directory. Access beyond this temporary directory is denied.
  • Modules: Block modules like os, sys, multiprocessing, subprocess, etc. that may modify the filesystem or reside programs in the memory.

Tokenizer issue

There may be side cases where tokens don't split exactly at the end of </code>. For example, the tokenizer of GPT-4o will generate the following

... </##code##>x

If we tweak >x to be >, it will change the log prob in RL. My current solution is to not touch any generated token and directly append results afterwards

... </code>x<result> ...

@KiddoZhu KiddoZhu requested review from SahilJain314 and terrykong May 8, 2025 04:10
@KiddoZhu
Copy link
Contributor Author

KiddoZhu commented May 8, 2025

Just added environments for code execution and tool use. Do we need to keep the previous generate_with_code_and_tools implementation? My feeling is that a policy.generate-like interface is more convenient when users just want to do inference.

@KiddoZhu KiddoZhu requested a review from parthchadha May 8, 2025 20:37
@KiddoZhu KiddoZhu changed the title Code execution & tool use feat: code execution & tool use May 8, 2025
KiddoZhu added 4 commits May 8, 2025 15:50
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
@KiddoZhu KiddoZhu force-pushed the zhaochengz/tool-calling branch from a03ab6f to 487cd94 Compare May 8, 2025 22:51
@snowmanwwg snowmanwwg added the r0.3.0 Release r0.3.0 label May 15, 2025
@KiddoZhu
Copy link
Contributor Author

@SahilJain314 any comment or suggestion?

@KiddoZhu KiddoZhu force-pushed the zhaochengz/tool-calling branch from 80ad777 to 7ab7350 Compare June 9, 2025 21:47
@KiddoZhu
Copy link
Contributor Author

KiddoZhu commented Jun 9, 2025

@SahilJain314 I've rolled back both unnecessary implementations you mentioned. Now it's ready to merge except for a mypy check. Do you know how to solve it?

@KiddoZhu
Copy link
Contributor Author

Just rolled back the logic of generation length in HFPolicy. I opened #499 for future discussions on this issue.

@terrykong terrykong removed the r0.3.0 Release r0.3.0 label Jul 14, 2025
@KiddoZhu
Copy link
Contributor Author

Hi @SahilJain314 I noticed HF path has been removed in test_rollout.py. How shall I adapt the HF path in tool use? Do we want to remove it as well?

@SahilJain314
Copy link
Contributor

I think it's safe to remove HF generation as a supported rollout path if that's what you're asking

@KiddoZhu
Copy link
Contributor Author

Thanks! I removed them and all relative tests are passed. Should be ready to merge.

@SahilJain314 SahilJain314 added the CI:L0 Run doctests and unit tests label Jul 26, 2025
@SahilJain314
Copy link
Contributor

SahilJain314 commented Jul 26, 2025

I'm happy to merge this as it stands, but I'd love to see an example (and convergence plots) with this soon (in follow up) to show users how to use it.

@KiddoZhu please also address the DCO.

SahilJain314
SahilJain314 previously approved these changes Jul 26, 2025
@github-actions github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Jul 28, 2025
@KiddoZhu KiddoZhu force-pushed the zhaochengz/tool-calling branch from 59cd464 to c45dd66 Compare July 28, 2025 22:21
@github-actions github-actions bot removed documentation Improvements or additions to documentation CI Relating to CI labels Jul 28, 2025
KiddoZhu added 2 commits July 28, 2025 16:12
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
@KiddoZhu KiddoZhu force-pushed the zhaochengz/tool-calling branch from c45dd66 to 5bed075 Compare July 28, 2025 23:16
@KiddoZhu
Copy link
Contributor Author

DCO resolved.

I'm happy to merge this as it stands, but I'd love to see an example (and convergence plots) with this soon (in follow up) to show users how to use it.

Yes, I will add an example and a doc file once I use code execution in a more realistic setup. It'll be in a separate thread.

SahilJain314
SahilJain314 previously approved these changes Jul 28, 2025
@terrykong
Copy link
Collaborator

@KiddoZhu a couple of other small things:

@ray.remote  # pragma: no cover

parthchadha
parthchadha previously approved these changes Jul 29, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong dismissed stale reviews from parthchadha and SahilJain314 via 013fb29 July 29, 2025 21:13
@terrykong terrykong enabled auto-merge July 29, 2025 21:13
@terrykong terrykong added this pull request to the merge queue Jul 29, 2025
@KiddoZhu
Copy link
Contributor Author

Thanks Terry! I will do that for future commits.

Merged via the queue into main with commit d256fb5 Jul 30, 2025
15 checks passed
@terrykong terrykong deleted the zhaochengz/tool-calling branch July 30, 2025 00:00
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jul 30, 2025
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
tpoisonooo pushed a commit to tpoisonooo/RL that referenced this pull request Aug 4, 2025
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: tpoisonooo <khj.application@aliyun.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
soodoshll pushed a commit to soodoshll/RL that referenced this pull request Aug 13, 2025
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Qidong Su <qidongs@nvidia.com>
@shamanez
Copy link

Any coding/tool use example, specially with a sandbox?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants