Skip to content

Conversation

@ShareLer
Copy link
Contributor

@ShareLer ShareLer commented Jun 7, 2025

Checklist Before Starting

  • Search for similar PR(s).

What does this PR do?

Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical conflicts after the PR were merged.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

  • Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank.
  • Adjust the processing related to position_ids in _ulysses_flash_attn_forward.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this 

Test

image

Additional Info.

  • Issue Number: Fixes issue # or discussion # if any.
  • Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
  • Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title if it breaks any API.
  • Update the documentation about your changes in the docs.
  • New CI unit test(s) are added to cover the code path.
  • Rely on existing unit tests on CI that covers the code path.

@vermouth1992 vermouth1992 requested a review from hiyouga June 7, 2025 02:04
Copy link
Collaborator

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! would u mind adding a unit test in tests/models/test_transformers_ulysses.py that reproduce this error?

Copy link
Collaborator

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Now we should split not the image features but the input ids only

@vermouth1992 vermouth1992 merged commit ea121f0 into volcengine:main Jun 10, 2025
34 of 35 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 10, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(volcengine#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(volcengine#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <[email protected]>
@ShareLer
Copy link
Contributor Author

thanks! would u mind adding a unit test in tests/models/test_transformers_ulysses.py that reproduce this error?

It might not be possible to add the unit test of KimiVL because the unit tests in tests/models/test_transformers_ulysses.py rely on importing the model's config from transformers (ie: 'KimiVLConfig') to create a mock model by from_config, while in transformers the model and config of kimiVL are not defined.

It is necessary to add ulysses-related unit tests for VLM. Perhaps I can first try to add a unit test related to ulysses for QwenVL.

@hiyouga
Copy link
Collaborator

hiyouga commented Jun 10, 2025

@ShareLer The unittest for qwen2vl ulysses already exists

- name: Running Geo3k VLM PPO E2E training tests on 8 L20 GPUs with rmpad using function rm
run: |
ray stop --force
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
ADV_ESTIMATOR=gae RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
SP_SIZE=2 \
bash tests/e2e/ppo_trainer/run_function_reward.sh

I think current state is sufficient

whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(volcengine#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(volcengine#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <[email protected]>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(volcengine#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(volcengine#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <[email protected]>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(volcengine#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(volcengine#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants