[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout #1475

SwordFaith · 2025-05-10T17:00:08Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

Add one-line overview of what this PR aims to achieve or accomplish.

Resolved the tool formatting issue: Previously, arguments were stored as strings, causing iterative addition of \\ due to multiple calls to json.dumps.

Fixed the response_position_ids mismatch between generate_sequences and generate_sequences_with_tools: In the earlier implementation, generate_sequences_with_tools used zero padding for positions where attention mask == 0, which resulted in NaN values during the training phase.

Specific Changes

List the specific changes.

Introduced a new schema, OpenAIFunctionCallSchema, to store converted tool calls.
Updated the AsyncSGLangRollout tool to skip non-dict type arguments instead of handling any string at the arguments position.
Aligned response_position_ids in generate_sequences_with_tools with the behavior of generate_sequences.
Enhanced tool descriptions to prevent misleading parse errors, as returning 0.0 caused the model to incorrectly modify answers.

API

Demonstrate how the API changes if any.

Revise the execute interface of the tool to directly accept dict[str, Any] instead of a JSON string.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

Issue Number: Fixes issue # or discussion # if any.
Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if neccessary.

SwordFaith · 2025-05-11T02:52:15Z

new wandb log: https://wandb.ai/swordfaith/gsm8k_async_rl/runs/ta7jhvgq?nw=nwuserswordfaith

seems nan issue is fixed.

vermouth1992 · 2025-05-11T07:17:26Z

Shall we merge this @zhaochenyang20

zhaochenyang20

Great job

supermancmk · 2025-05-19T10:04:50Z

Thank you very much, but it still doesn't seem to work and doesn't solve the problem of the model training crash.
I pulled the latest version of verl's code without any modifications and ran the official gsm8k with tool example and the model still training crashes and appears Nan.
Any solutions please. Here is my wandb log
@SwordFaith @vermouth1992 @zhaochenyang20

非常感谢，但好像还是不起作用，没有解决模型训练崩溃的问题。我拉取了verl最新版本的代码，没有进行任何修改，跑官方gsm8k with tool的例子，模型仍然会训崩，请问有什么解决办法。下面是我wandb日志

…LangRollout (volcengine#1475) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Resolved the tool formatting issue: Previously, arguments were stored as strings, causing iterative addition of `\\` due to multiple calls to `json.dumps`. Fixed the `response_position_ids` mismatch between `generate_sequences` and `generate_sequences_with_tools`: In the earlier implementation, `generate_sequences_with_tools` used zero padding for positions where `attention mask == 0`, which resulted in NaN values during the training phase. ### Specific Changes > List the specific changes. - Introduced a new schema, `OpenAIFunctionCallSchema`, to store converted tool calls. - Updated the `AsyncSGLangRollout` tool to skip non-dict type arguments instead of handling any string at the arguments position. - Aligned `response_position_ids` in `generate_sequences_with_tools` with the behavior of `generate_sequences`. - Enhanced tool descriptions to prevent misleading parse errors, as returning 0.0 caused the model to incorrectly modify answers. ### API > Demonstrate how the API changes if any. - Revise the `execute` interface of the tool to directly accept `dict[str, Any]` instead of a JSON string. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.

SwordFaith added 2 commits May 11, 2025 00:39

Add sglang rollout volcengine#1385 changes to sglang_async

3d481c6

Fix tool format and response position ids issue

576da5b

SwordFaith mentioned this pull request May 11, 2025

[sglang] feat: Add SGLang async multi-turn rollout with tool support #1037

Merged

3 tasks

vermouth1992 requested a review from zhaochenyang20 May 11, 2025 07:17

zhaochenyang20 approved these changes May 11, 2025

View reviewed changes

zhaochenyang20 merged commit bc9062d into volcengine:main May 11, 2025
28 checks passed

SwordFaith mentioned this pull request May 12, 2025

Multi-turn rollout Update #1 zhaochenyang20/Awesome-ML-SYS-Tutorial#113

Open

53 tasks

SwordFaith deleted the fix/sglang_async_multiturn_nan branch May 19, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout #1475

[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout #1475

Uh oh!

SwordFaith commented May 10, 2025

Uh oh!

SwordFaith commented May 11, 2025

Uh oh!

vermouth1992 commented May 11, 2025

Uh oh!

zhaochenyang20 left a comment

Uh oh!

Uh oh!

supermancmk commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout #1475

[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout #1475

Uh oh!

Conversation

SwordFaith commented May 10, 2025

Checklist Before Starting

What does this PR do?

Specific Changes

API

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

SwordFaith commented May 11, 2025

Uh oh!

vermouth1992 commented May 11, 2025

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

supermancmk commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants