[recipe] feat: add deepeyes recipe#2398
[recipe] feat: add deepeyes recipe#2398wuxibin89 merged 41 commits intoverl-project:mainfrom Maxwell-Jia:recipe/deepeyes
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new recipe for DeepEyes, including a new visual tool ImageZoomInTool, a custom reward function, and associated configurations and preprocessing scripts. The changes also include tracking tool usage metrics.
The review identified several critical issues in the new reward function script, including a bug in API client initialization, unhandled exceptions during API calls, and inconsistent logic for parsing model outputs. Additionally, a high-severity issue was found in the ImageZoomInTool where its implementation violates the interface of its base class, potentially leading to runtime errors. These issues should be addressed to ensure the correctness and robustness of the new recipe.
|
Could you provide a runnable script that trains a model that can improves the performance by using image zooming tools? Thanks! |
|
@vermouth1992 Hello, I have provided a bash script in the file recipe/deepeyes/run_deepeyes_grpo.sh. Currently training in my environment, due to constant OOM errors, full logs and evaluation scripts may be available later. |
|
Thanks for contributing. Would you please add one readme like this: https://github.com/volcengine/verl/blob/main/recipe/sppo/README.md to help community reproduce and check the correctness step by step. I can get people from SGLang community to reproduce and double-check. |
|
@Maxwell-Jia Could you add my wechat? 18015766633 |
I've added README.md under recipe/deepeyes. |
|
great! |
|
I refactored the deepeyes recipe code, removing unnecessary functions, etc. |
|
At present, the training process of deepeyes is smooth, but in my scenario, the following problems will occur after training a certain step, resulting in training failure: I suspect that there is a bug in verl async-multi-turn training or there is a problem with the configuration of my local environment. I don't know if other people have this problem. |
Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages. |
Yes, this is the first line of the error, and the preceding message is the log output of some auxiliary debug during the rollout phase. And I am sure that this error occurred during the actor update phase. |
May I ask what version of I am using However, (See: Dao-AILab/flash-attention#1734) This is preventing the training from proceeding.
Disable |
I am using |
|
When I tried to reproduce your code and check the loss curve, I encountered a problem similar to #2445 [36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.221884254 ProcessGroupNCCL.cpp:632] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=937, OpType=ALLREDUCE, NumelIn=2, NumelOut=2, Timeout(ms)=1800000) ran for 1800070 milliseconds before timing out. Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. |
|
@Zhou-jiecheng Reference to #2398 (comment), set |
|
@Zhou-jiecheng We found an important bug. I think this is probably the root cause of this inexplicable error. The problem lies in the communication between our rollout and training processes. We currently convert generated For example, a first-turn response like This token-level inconsistency for the same conversational history creates significant instability during training, leading to the inexplicable errors we've been seeing. Thanks @xieck13 for finding and feeding back this error. So far I have implemented multimodal tool calls under |
|
I wonder if you have ignored to process the multimodal information in Interleaved MCoT, the corresponding official code at https://github.com/Visual-Agent/DeepEyes/blob/561293def6dc71fa7ac8b5bc674c070c393c9d94/verl/workers/agent/parallel_env.py#L284. If you have considered it, can you tell me the logic of your processing? Thanks! |
|
@Zhou-jiecheng Hello, it is seem to be an issue here. We're working on a fix. Could you share which dataset you're using for this reward curve? |
Hello, I use data_v0.8_visual_toolbox_v2.parquet 90% for training and 10% for validation. |
|
@lzxdjb What is your version of transformers? |
|
Thank you so much for your reply! I am using the latest verl code and the latest docker images provided by verl: verlai/verl:app-verl0.5-sglang0.4.9.post6-mcore0.12.2-te2.2 The transformer in this docker images is: Version: 4.53.2, which is the newest version |
|
See huggingface/transformers#39685. This should be a bug with transformers, and you can try switching versions. Newer versions, such as 4.54.0, or older versions such as 0.52.3, should not have this problem. |
|
I use the 4.54.0 and fix the problem. Thank you so much for your patient FAQs!🥰🥰🥰🥰 |
May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch. The versions I am using are as follows: |
have you resolved the problem? i encounter the same problem |
Yes, I’ve solved the issue. After installing verl, run pip install "sglang[all]==0.4.10.post2"This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before.
Here are the versions of my key libraries: If you hit any problems, let me know—I can share my whole working requirements.txt to you. |
could you share it with me? my email address is 476122294@qq.com. Thank you so much |
Thank you very much for your work, but I can't find: deepeyes47k_preprocess.py |
### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.
|
@cq-dong It was updated, and there is no need for preprocessing, just use the original dataset files. |
|
Should I set |
|
Hi, could you explain this comment in recipe/deepeyes/deepeyes.py: I found that tool description was not added to system prompt by custom_chat_template in val/generations, for example: |
### What does this PR do? Follow verl-project/verl#2398, support vLLM multi-modal.
|
请问大佬 这个能否添加多个tool 即可以function call多个工具 如果可以的话是在类似 https://github.com/volcengine/verl/blob/main/recipe/deepeyes/configs/image_zoom_in_tool_config.yaml 的yaml里添加多个function call 的 py文件吗 |
### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
Hello. Could you please share the requirements.txt with me? My email is 1194913898@qq.com. Thank you! |
…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.
### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.
|
Hello! I'd like to know whether the DeepEyes recipe supports the qwen3vl dense models for multi-turn tool-using sampling? When using Qwen3-vl, the vllm needs to be 0.11.0 and torch needs to be 2.8.0 or upper |
### What does this PR do? Follow verl-project/verl#2398, support vLLM multi-modal.
### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.
### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

What does this PR do?
This PR introduces a complete training recipe for DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning.
The core feature is the support for multi-turn visual tools, specifically the
ImageZoomInTool, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance.Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb.
API and Usage Example
The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file.
The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe.
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
ImageZoomInToolwith robust bbox validation and resizing.Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace.