Conversation
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Sahil Jain <sahil.jain5125@gmail.com> Co-authored-by: Parth Chadha <parth29@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Gerald Shen <geshen@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Yi-Fu Wu <yifuw@nvidia.com> Co-authored-by: ahmadki <ahmadki@users.noreply.github.com> Co-authored-by: Nathan McKimpson <nmckimpson@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
…#15) Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
- flatten hyperparams for tb no longer errors for lists (was an issue for schedulers) - the submission script now overlaps the head on the first worker (no longer needs extra node just for head) - fixes the CI to handle weird permissions issues - added sphinx build and doctest to CI - added functional tests to CI - nuked an old example - added docs for functional tests - --no-container-mount-home - fix a unit tests that expected cuda to skip - allow running unit tests on slurm head node with no gpu - add a hermetic script to run functional tests Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
…cedence (#25) Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
…x math (#28) Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
…de (#39) Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
#78) Signed-off-by: Parth Chadha <pchadha@nvidia.com>
…rge-able) (#32) Signed-off-by: Sahil Jain <sahilj@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ashors1 <ashors@nvidia.com>
…283) Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
|
Now we have an implementation based on stop strings, making it independent of vllm and hf backends. The function
@SahilJain314 The way we calculated I'll merge the main branch and clean up the logit processor implementation. |
|
Done. Ready for review. |
.gitignore
Outdated
| dist/ | ||
| *.egg-info/ | ||
| *.vscode/ | ||
| uv.lock |
There was a problem hiding this comment.
@terrykong afaik this shouldn't be gitignored, right?
There was a problem hiding this comment.
yea, we should not ignore the lock
| else: | ||
| gen_length = len(generated_part) | ||
|
|
||
| gen_length = (generated_logprob != 0).sum().item() |
There was a problem hiding this comment.
This assumption doesn't hold. Generated logprobs are actually surprisingly sometimes 0.
There was a problem hiding this comment.
Then how can we safely decide the length? The old implementation mistakes padded eos token as generated eos token. It's a serious problem for custom stop strings.
There was a problem hiding this comment.
Can we replace this with the existing run_multi_turn_generation function? and use environments to handle the stop tokens as we do right now?
There was a problem hiding this comment.
Sounds good to me. Will code execution always be invoked in a multi-turn chat format? How about changing run_multi_turn_rollout to normal prompt format, and then provide an additional interface on top of that to support chat messages?
There was a problem hiding this comment.
the default multi turn chat actually runs without chat templating (the template is applied in the string/tokens directly). To chat template it, the environment would have to do it. (Open to suggestions on this approach)
|
I'm a little hesistant to push this out without filesystem isolation |
Good point! What's the solution in your mind? I can think of chroot or docker, but they are too heavy. If we believe the code is just untrusted, not malicious, a simple solution is to 1) run the code in a temporary directory 2) override |
|
Sorry I messed up the commit history. This branch will be no longer touched but kept for retrieving history. I will open a new branch and pull request. |
Add a basic version of code execution & tool use (#55, #56)
Implemented by a custom LogitProcessor in vllm backend. I implement tools as pre-defined variables in code environment so that there is no need to distinguish code and tool use. It also makes maintenance easier. The only concern is that code execution must be enabled whenever we want to enable tool use, though we can avoid telling the existence of arbitrary code execution beyond the tools in the prompt.
Supported features
Examples (test_vllm_tools.py)
Tokenizer issue
Although I design
CodeLogitProcessorto be tokenizer-agnostic as much as possible, there may be side cases where tokens don't split exactly at the end of</code>. For example, the tokenizer of GPT-4o will generate the followingIf we tweak
>xto be>, it will change the log prob in RL. My current solution is to not touch any generated token and directly append results afterwards