-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout #1475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout #1475
Conversation
|
new wandb log: https://wandb.ai/swordfaith/gsm8k_async_rl/runs/ta7jhvgq?nw=nwuserswordfaith seems nan issue is fixed. |
|
Shall we merge this @zhaochenyang20 |
zhaochenyang20
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job
|
Thank you very much, but it still doesn't seem to work and doesn't solve the problem of the model training crash. 非常感谢,但好像还是不起作用,没有解决模型训练崩溃的问题。我拉取了verl最新版本的代码,没有进行任何修改,跑官方gsm8k with tool的例子,模型仍然会训崩,请问有什么解决办法。下面是我wandb日志 |
…LangRollout (volcengine#1475) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Resolved the tool formatting issue: Previously, arguments were stored as strings, causing iterative addition of `\\` due to multiple calls to `json.dumps`. Fixed the `response_position_ids` mismatch between `generate_sequences` and `generate_sequences_with_tools`: In the earlier implementation, `generate_sequences_with_tools` used zero padding for positions where `attention mask == 0`, which resulted in NaN values during the training phase. ### Specific Changes > List the specific changes. - Introduced a new schema, `OpenAIFunctionCallSchema`, to store converted tool calls. - Updated the `AsyncSGLangRollout` tool to skip non-dict type arguments instead of handling any string at the arguments position. - Aligned `response_position_ids` in `generate_sequences_with_tools` with the behavior of `generate_sequences`. - Enhanced tool descriptions to prevent misleading parse errors, as returning 0.0 caused the model to incorrectly modify answers. ### API > Demonstrate how the API changes if any. - Revise the `execute` interface of the tool to directly accept `dict[str, Any]` instead of a JSON string. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.





Checklist Before Starting
What does this PR do?
Resolved the tool formatting issue: Previously, arguments were stored as strings, causing iterative addition of
\\due to multiple calls tojson.dumps.Fixed the
response_position_idsmismatch betweengenerate_sequencesandgenerate_sequences_with_tools: In the earlier implementation,generate_sequences_with_toolsused zero padding for positions whereattention mask == 0, which resulted in NaN values during the training phase.Specific Changes
OpenAIFunctionCallSchema, to store converted tool calls.AsyncSGLangRollouttool to skip non-dict type arguments instead of handling any string at the arguments position.response_position_idsingenerate_sequences_with_toolswith the behavior ofgenerate_sequences.API
executeinterface of the tool to directly acceptdict[str, Any]instead of a JSON string.Usage Example
# Add code snippet or script demonstrating how to use thisTest
Additional Info.
Checklist Before Submitting
[BREAKING]to the PR title if it breaks any API.