[model] fix: refactor qwen2vl patches & support no-image input for fsdp#3496
[model] fix: refactor qwen2vl patches & support no-image input for fsdp#3496vermouth1992 merged 1 commit intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the monkey patching for Qwen2-VL models by consolidating logic from qwen2_5_vl.py into qwen2_vl.py and removing the former. It also adds support for text-only inputs with FSDP by creating dummy image tensors. The refactoring is a good step towards reducing code duplication. However, I've found a few critical bugs and areas for improvement. Specifically, there's a NameError in monkey_patch.py due to an incorrect import order, and a TypeError in qwen2_vl.py from subscripting a None value. I've also noted the use of magic numbers and missing fields in return objects which could affect maintainability and functionality like cached generation. Please see my detailed comments for suggestions.
dae82dd to
5c85f62
Compare
|
Great job! I have some questions on this bug, can you share you ideas?
|
5c85f62 to
6650d6a
Compare
|
It seems that CI fails :( |
0ee2f91 to
0f30b49
Compare
0f30b49 to
3b550e2
Compare
|
@vermouth1992 This PR is ready for merge :) |
|
Great job! This pr solve my confusion about mixed data training bugs. By the way, it seems that multi-turn sglang rollout generation still use old approach to get position ids? |
|
@Clementine24 Possibly I forgot it. Could you submit a PR to fix it? |
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…dp (verl-project#3496) ### What does this PR do? This PR tries to fix verl-project#3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](https://github.com/huggingface/transformers/tree/6e50a8afb2540ac1acaa4b62cf1dd5f1170f6511) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes ####⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
What does this PR do?
This PR tries to fix #3491
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
Tested with latest transformers
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len)
Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids.
This aligns with the implementation in the Transformers >= 4.54.0 🤗
https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469
🎤 New
We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability.
🚨 Changes
We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part.
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)