Skip to content

[sglang] feat: add multimodal input to multiturn async rollout#2014

Merged
zhaochenyang20 merged 65 commits intoverl-project:mainfrom
nanjiangwill:feat/add-multimodal-multiturn-sglang
Jun 22, 2025
Merged

[sglang] feat: add multimodal input to multiturn async rollout#2014
zhaochenyang20 merged 65 commits intoverl-project:mainfrom
nanjiangwill:feat/add-multimodal-multiturn-sglang

Conversation

@nanjiangwill
Copy link
Contributor

Checklist Before Starting

  • Searched for similar PR(s).

What does this PR do?

This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it.

High-Level Design

Since sglang engine already handle the image input, just need to properly handling the tokenization.

Specific Changes

Change self.tokenizer.apply_chat_template() to self.processing_class.apply_chat_template(). processing_class could be tokenizer or processor.

Usage Example

It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title description if it breaks any API.
  • Update the documentation about your changes in the docs.
  • New CI unit test(s) are added to cover the code path.
  • Rely on existing unit tests on CI that covers the code path.

@nanjiangwill nanjiangwill force-pushed the feat/add-multimodal-multiturn-sglang branch from 2b560a9 to e0d1bf2 Compare June 18, 2025 17:02
@zhaochenyang20
Copy link
Collaborator

@nanjiangwill Is it ready for us to test it in public?

@nanjiangwill
Copy link
Contributor Author

yes it is ready for public testing

@nanjiangwill nanjiangwill requested a review from chenhaiq as a code owner June 19, 2025 17:28
@zhaochenyang20 zhaochenyang20 changed the title [BREAKING][sglang] feat: add multimodal input to multiturn async rollout [sglang] feat: add multimodal input to multiturn async rollout Jun 21, 2025
Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon/Amazing job 😂

Copy link
Collaborator

@SwordFaith SwordFaith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Nan

@zhaochenyang20 zhaochenyang20 merged commit 644aaa7 into verl-project:main Jun 22, 2025
43 of 44 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 23, 2025
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
Sirius-L1 pushed a commit to Sirius-L1/verl that referenced this pull request Jun 24, 2025
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
Tyizhanshen pushed a commit to HyperdriveHustle/verl that referenced this pull request Jul 1, 2025
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jan 20, 2026
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…project#2014)

### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants