Code execution & tool use by KiddoZhu · Pull Request #204 · NVIDIA-NeMo/RL

KiddoZhu · 2025-04-16T06:05:04Z

Add a basic version of code execution & tool use (#55, #56)

Implemented by a custom LogitProcessor in vllm backend. I implement tools as pre-defined variables in code environment so that there is no need to distinguish code and tool use. It also makes maintenance easier. The only concern is that code execution must be enabled whenever we want to enable tool use, though we can avoid telling the existence of arbitrary code execution beyond the tools in the prompt.

Supported features

On-the-fly multiple tool calls.
Compatible with batch decoding.
Stateful code executor. In the same generation, functions and variables will be passed to the next code snippet.

Examples (test_vllm_tools.py)

<code>x = 3; y = 4</code>
This is some regular text.
<code>x + y</code>
<result>7</result>

<code>retrieve('Jen-Hsun Huang')</code>

<result>
['Nvidia was established in 1993 by Jen-Hsun Huang, Curtis Priem, and Chris '
 'Malachowsky. In 2000 Nvidia took intellectual possession of 3dfx, one of the '
 'biggest GPU producers in 1990s.']
</result>

Tokenizer issue

Although I design CodeLogitProcessor to be tokenizer-agnostic as much as possible, there may be side cases where tokens don't split exactly at the end of </code>. For example, the tokenizer of GPT-4o will generate the following

... </##code##>x

If we tweak >x to be >, it will change the log prob in RL. My current solution is to not touch any generated token and directly append results afterwards

... </code>x<result> ...

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com>

Co-authored-by: Sahil Jain <sahil.jain5125@gmail.com> Co-authored-by: Parth Chadha <parth29@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Gerald Shen <geshen@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Yi-Fu Wu <yifuw@nvidia.com> Co-authored-by: ahmadki <ahmadki@users.noreply.github.com> Co-authored-by: Nathan McKimpson <nmckimpson@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

…#15) Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

- flatten hyperparams for tb no longer errors for lists (was an issue for schedulers) - the submission script now overlaps the head on the first worker (no longer needs extra node just for head) - fixes the CI to handle weird permissions issues - added sphinx build and doctest to CI - added functional tests to CI - nuked an old example - added docs for functional tests - --no-container-mount-home - fix a unit tests that expected cuda to skip - allow running unit tests on slurm head node with no gpu - add a hermetic script to run functional tests Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>