Skip to content

[model] fix: refactor qwen2vl patches & support no-image input for fsdp#3496

Merged
vermouth1992 merged 1 commit intoverl-project:mainfrom
hiyouga:fix_qwen2vl
Sep 18, 2025
Merged

[model] fix: refactor qwen2vl patches & support no-image input for fsdp#3496
vermouth1992 merged 1 commit intoverl-project:mainfrom
hiyouga:fix_qwen2vl

Conversation

@hiyouga
Copy link
Collaborator

@hiyouga hiyouga commented Sep 16, 2025

What does this PR do?

This PR tries to fix #3491

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Tested with latest transformers

image image image

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len)

Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗
https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability.

🚨 Changes

We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the monkey patching for Qwen2-VL models by consolidating logic from qwen2_5_vl.py into qwen2_vl.py and removing the former. It also adds support for text-only inputs with FSDP by creating dummy image tensors. The refactoring is a good step towards reducing code duplication. However, I've found a few critical bugs and areas for improvement. Specifically, there's a NameError in monkey_patch.py due to an incorrect import order, and a TypeError in qwen2_vl.py from subscripting a None value. I've also noted the use of magic numbers and missing fields in return objects which could affect maintainability and functionality like cached generation. Please see my detailed comments for suggestions.

@techkang
Copy link
Collaborator

Great job! I have some questions on this bug, can you share you ideas?

  1. Do you know the root cause of this bug?
  2. It seems that we can set warp_policy to None the same as https://github.com/volcengine/verl/blob/5c98ed1b313be5ae3de10a3e800cbb769374e1a5/verl/workers/fsdp_workers.py#L452-L454 and the training process can continue. Can we fix the bug this way?
  3. If both method can fix this, which one is better?

@vermouth1992
Copy link
Collaborator

It seems that CI fails :(

@hiyouga hiyouga force-pushed the fix_qwen2vl branch 3 times, most recently from 0ee2f91 to 0f30b49 Compare September 17, 2025 09:46
@hiyouga
Copy link
Collaborator Author

hiyouga commented Sep 17, 2025

@vermouth1992 This PR is ready for merge :)

@vermouth1992 vermouth1992 merged commit 0d4541f into verl-project:main Sep 18, 2025
63 of 68 checks passed
@Clementine24
Copy link

Great job! This pr solve my confusion about mixed data training bugs. By the way, it seems that multi-turn sglang rollout generation still use old approach to get position ids?

@hiyouga
Copy link
Collaborator Author

hiyouga commented Sep 20, 2025

@Clementine24 Possibly I forgot it. Could you submit a PR to fix it?

@hiyouga hiyouga deleted the fix_qwen2vl branch October 6, 2025 13:05
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…dp (verl-project#3496)

### What does this PR do?

This PR tries to fix verl-project#3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

warp_policy will make GPU hang when training mixed multi modal data

6 participants