fix sequence parallelism conflict in kimiVL by ShareLer · Pull Request #1899 · verl-project/verl

ShareLer · 2025-06-07T02:04:26Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical conflicts after the PR were merged.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank.
Adjust the processing related to position_ids in _ulysses_flash_attn_forward.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this

Test

Additional Info.

Issue Number: Fixes issue # or discussion # if any.
Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
New CI unit test(s) are added to cover the code path.
Rely on existing unit tests on CI that covers the code path.

Signed-off-by: ShareLer <ShareLe@163.com>

eric-haibin-lin

thanks! would u mind adding a unit test in tests/models/test_transformers_ulysses.py that reproduce this error?

hiyouga

LGTM. Now we should split not the image features but the input ids only

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test ![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0) ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>

ShareLer · 2025-06-10T11:13:31Z

thanks! would u mind adding a unit test in tests/models/test_transformers_ulysses.py that reproduce this error?

It might not be possible to add the unit test of KimiVL because the unit tests in tests/models/test_transformers_ulysses.py rely on importing the model's config from transformers (ie: 'KimiVLConfig') to create a mock model by from_config, while in transformers the model and config of kimiVL are not defined.

It is necessary to add ulysses-related unit tests for VLM. Perhaps I can first try to add a unit test related to ulysses for QwenVL.

hiyouga · 2025-06-10T11:21:51Z

@ShareLer The unittest for qwen2vl ulysses already exists https://github.com/volcengine/verl/blob/85fef90d518577b44abca3015249c6cde52b0be9/.github/workflows/e2e_ppo_trainer.yml#L200-L208

I think current state is sufficient

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test ![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0) ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>

ShareLer and others added 7 commits May 21, 2025 18:41

[feat] support kimi_vl

310ab23

Signed-off-by: ShareLer <ShareLe@163.com>

fix position_ids in sp

654114d

Signed-off-by: ShareLer <ShareLe@163.com>

change seq_len for pos_emb

98045be

Signed-off-by: ShareLer <ShareLe@163.com>

add license

cc9ad37

Signed-off-by: ShareLer <ShareLe@163.com>

fix conflicts

bb6375a

Signed-off-by: ShareLer <ShareLe@163.com>

Merge branch 'volcengine:main' into kimi_vl_support

ea995cb

fix sequence parallelism conflict for kimiVL

c59cf5f

Signed-off-by: ShareLer <ShareLe@163.com>

vermouth1992 requested a review from hiyouga June 7, 2025 02:04

eric-haibin-lin reviewed Jun 8, 2025

View reviewed changes

hiyouga approved these changes Jun 9, 2025

View reviewed changes

vermouth1992 approved these changes Jun 10, 2025

View reviewed changes

vermouth1992 merged commit ea121f0 into verl-project:main Jun 10, 2025
34 of 35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix sequence parallelism conflict in kimiVL#1899

fix sequence parallelism conflict in kimiVL#1899
vermouth1992 merged 7 commits intoverl-project:mainfrom
ShareLer:kimi_vl_support

ShareLer commented Jun 7, 2025 •

edited

Loading

Uh oh!

eric-haibin-lin left a comment

Uh oh!

hiyouga left a comment •

edited

Loading

Uh oh!

Uh oh!

ShareLer commented Jun 10, 2025

Uh oh!

hiyouga commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ShareLer commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist Before Starting

What does this PR do?

High-Level Design

Specific Changes

API

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

hiyouga left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ShareLer commented Jun 10, 2025

Uh oh!

hiyouga commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ShareLer commented Jun 7, 2025 •

edited

Loading

hiyouga left a comment •

edited

Loading