Conversation
|
Just added environments for code execution and tool use. Do we need to keep the previous |
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
a03ab6f to
487cd94
Compare
|
@SahilJain314 any comment or suggestion? |
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
80ad777 to
7ab7350
Compare
|
@SahilJain314 I've rolled back both unnecessary implementations you mentioned. Now it's ready to merge except for a mypy check. Do you know how to solve it? |
|
Just rolled back the logic of generation length in HFPolicy. I opened #499 for future discussions on this issue. |
|
Hi @SahilJain314 I noticed HF path has been removed in |
|
I think it's safe to remove HF generation as a supported rollout path if that's what you're asking |
|
Thanks! I removed them and all relative tests are passed. Should be ready to merge. |
|
I'm happy to merge this as it stands, but I'd love to see an example (and convergence plots) with this soon (in follow up) to show users how to use it. @KiddoZhu please also address the DCO. |
59cd464 to
c45dd66
Compare
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
c45dd66 to
5bed075
Compare
|
DCO resolved.
Yes, I will add an example and a doc file once I use code execution in a more realistic setup. It'll be in a separate thread. |
|
@KiddoZhu a couple of other small things:
|
Signed-off-by: Terry Kong <terryk@nvidia.com>
|
Thanks Terry! I will do that for future commits. |
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: tpoisonooo <khj.application@aliyun.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>
|
Any coding/tool use example, specially with a sandbox? |
Sorry I messed up commit history when clearing secrets in some commits (#204). Here is a new clean branch.
@SahilJain314 How shall we decide
gen_lengthif generated logprobs can be 0?Supported features
ray.remote.Examples (test_vllm_tools.py)
A simple sandbox
os,sys,multiprocessing,subprocess, etc. that may modify the filesystem or reside programs in the memory.Tokenizer issue
There may be side cases where tokens don't split exactly at the end of
</code>. For example, the tokenizer of GPT-4o will generate the followingIf we tweak
>xto be>, it will change the log prob in RL. My current solution is to not touch any generated token and directly append results afterwards