[recipe] feat: add deepeyes recipe by Maxwell-Jia · Pull Request #2398 · verl-project/verl

Maxwell-Jia · 2025-07-07T09:08:11Z

What does this PR do?

This PR introduces a complete training recipe for DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning.

The core feature is the support for multi-turn visual tools, specifically the ImageZoomInTool, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance.

Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb.

API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file.

Preprocess the dataset. We need to add some tool-related extra_info:

python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>

Start the PPO training:

bash recipe/deepeyes/run_deepeyes_grpo.sh

The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

DeepEyes Recipe Integration: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes.
Visual Tool Support: Implemented ImageZoomInTool with robust bbox validation and resizing.
Tool Call Statistics: Modified the rollout and metrics code to track and log tool call counts per sample and per step.
Bug Fixes: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting.

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace.

CLAassistant · 2025-07-07T09:08:18Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces a new recipe for DeepEyes, including a new visual tool ImageZoomInTool, a custom reward function, and associated configurations and preprocessing scripts. The changes also include tracking tool usage metrics.

The review identified several critical issues in the new reward function script, including a bug in API client initialization, unhandled exceptions during API calls, and inconsistent logic for parsing model outputs. Additionally, a high-severity issue was found in the ImageZoomInTool where its implementation violates the interface of its base class, potentially leading to runtime errors. These issues should be addressed to ensure the correctness and robustness of the new recipe.

recipe/deepeyes/reward_function.py

verl/tools/image_zoom_in_tool.py

vermouth1992 · 2025-07-07T13:33:06Z

Could you provide a runnable script that trains a model that can improves the performance by using image zooming tools? Thanks!

Maxwell-Jia · 2025-07-07T15:11:21Z

@vermouth1992 Hello, I have provided a bash script in the file recipe/deepeyes/run_deepeyes_grpo.sh. Currently training in my environment, due to constant OOM errors, full logs and evaluation scripts may be available later.

zhaochenyang20 · 2025-07-07T18:35:18Z

Thanks for contributing. Would you please add one readme like this: https://github.com/volcengine/verl/blob/main/recipe/sppo/README.md

to help community reproduce and check the correctness step by step.

I can get people from SGLang community to reproduce and double-check.

zhaochenyang20 · 2025-07-08T03:23:09Z

@Maxwell-Jia Could you add my wechat? 18015766633

verl/tools/image_zoom_in_tool.py

Maxwell-Jia · 2025-07-09T03:54:14Z

Thanks for contributing. Would you please add one readme like this: https://github.com/volcengine/verl/blob/main/recipe/sppo/README.md

to help community reproduce and check the correctness step by step.

I can get people from SGLang community to reproduce and double-check.

I've added README.md under recipe/deepeyes.

zhaochenyang20 · 2025-07-09T18:04:05Z

great!

Maxwell-Jia · 2025-07-13T02:24:26Z

I refactored the deepeyes recipe code, removing unnecessary functions, etc.

Maxwell-Jia · 2025-07-13T02:32:40Z

At present, the training process of deepeyes is smooth, but in my scenario, the following problems will occur after training a certain step, resulting in training failure:

�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,359 E 1942698 1962889] logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
�[36m(WorkerDict pid=1942698)�[0m CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
�[36m(WorkerDict pid=1942698)�[0m For debugging consider passing CUDA_LAUNCH_BLOCKING=1
�[36m(WorkerDict pid=1942698)�[0m Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
�[36m(WorkerDict pid=1942698)�[0m frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x144c1096c1b6 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x144c10915a76 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x144c10d5b918 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #3: <unknown function> + 0x20d8e (0x144c10d21d8e in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #4: <unknown function> + 0x22507 (0x144c10d23507 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #5: <unknown function> + 0x2270f (0x144c10d2370f in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #6: <unknown function> + 0x6417b2 (0x144c03cd07b2 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
�[36m(WorkerDict pid=1942698)�[0m frame #7: <unknown function> + 0x6f30f (0x144c1094d30f in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #8: c10::TensorImpl::~TensorImpl() + 0x21b (0x144c1094633b in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #9: c10::TensorImpl::~TensorImpl() + 0x9 (0x144c109464e9 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #10: std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() + 0x88 (0x144c03cd25a8 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
�[36m(WorkerDict pid=1942698)�[0m frame #11: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x129 (0x144bf3ea1d19 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
�[36m(WorkerDict pid=1942698)�[0m frame #12: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x13f (0x144bf3e99c2f in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
�[36m(WorkerDict pid=1942698)�[0m frame #13: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x5c (0x144c03f643ac in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
�[36m(WorkerDict pid=1942698)�[0m frame #14: <unknown function> + 0xdbbf4 (0x147b71a2bbf4 in /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6)
�[36m(WorkerDict pid=1942698)�[0m frame #15: <unknown function> + 0x94ac3 (0x147b740b9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
�[36m(WorkerDict pid=1942698)�[0m frame #16: <unknown function> + 0x126850 (0x147b7414b850 in /lib/x86_64-linux-gnu/libc.so.6)
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,442 E 1942698 1962889] logging.cc:119: Stack trace: 
�[36m(WorkerDict pid=1942698)�[0m  /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/ray/_raylet.so(+0x1464c1a) [0x147b72fdec1a] ray::operator<<()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/ray/_raylet.so(+0x14681f2) [0x147b72fe21f2] ray::TerminateHandler()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(+0xb135a) [0x147b71a0135a] __cxxabiv1::__terminate()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(+0xb03b9) [0x147b71a003b9]
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(__gxx_personality_v0+0x87) [0x147b71a00ae7] __gxx_personality_v0
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libgcc_s.so.1(+0x111e4) [0x147b719471e4] _Unwind_RaiseException_Phase2
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libgcc_s.so.1(_Unwind_Resume+0x12e) [0x147b71947c1e] _Unwind_Resume
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so(+0x143f7) [0x144c10d153f7] c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::free()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so(+0x2270f) [0x144c10d2370f] c10::cuda::CUDACachingAllocator::Native::local_raw_delete()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x6417b2) [0x144c03cd07b2] c10::StorageImpl::~StorageImpl()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so(+0x6f30f) [0x144c1094d30f] c10::intrusive_ptr<>::reset_()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so(_ZN3c1010TensorImplD1Ev+0x21b) [0x144c1094633b] c10::TensorImpl::~TensorImpl()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so(_ZN3c1010TensorImplD0Ev+0x9) [0x144c109464e9] c10::TensorImpl::~TensorImpl()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so(_ZNSt6vectorIN2at6TensorESaIS1_EED2Ev+0x88) [0x144c03cd25a8] std::vector<>::~vector()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE+0x129) [0x144bf3ea1d19] torch::autograd::Engine::thread_main()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb+0x13f) [0x144bf3e99c2f] torch::autograd::Engine::thread_init()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so(_ZN5torch8autograd6python12PythonEngine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb+0x5c) [0x144c03f643ac] torch::autograd::python::PythonEngine::thread_init()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(+0xdbbf4) [0x147b71a2bbf4] execute_native_thread_routine
�[36m(WorkerDict pid=1942698)�[0m /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x147b740b9ac3]
�[36m(WorkerDict pid=1942698)�[0m /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x147b7414b850]
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m *** SIGABRT received at time=1752336922 on cpu 74 ***
�[36m(WorkerDict pid=1942698)�[0m PC: @     0x147b740bb9fc  (unknown)  pthread_kill
�[36m(WorkerDict pid=1942698)�[0m     @     0x147b74067520  (unknown)  (unknown)
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,443 E 1942698 1962889] logging.cc:496: *** SIGABRT received at time=1752336922 on cpu 74 ***
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,443 E 1942698 1962889] logging.cc:496: PC: @     0x147b740bb9fc  (unknown)  pthread_kill
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,443 E 1942698 1962889] logging.cc:496:     @     0x147b74067520  (unknown)  (unknown)
�[36m(WorkerDict pid=1942698)�[0m Fatal Python error: Aborted
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m Stack (most recent call first):
�[36m(WorkerDict pid=1942698)�[0m   <no Python frame>

I suspect that there is a bug in verl async-multi-turn training or there is a problem with the configuration of my local environment. I don't know if other people have this problem.

mantle2048 · 2025-07-13T02:37:11Z

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

Maxwell-Jia · 2025-07-13T02:45:47Z

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

Yes, this is the first line of the error, and the preceding message is the log output of some auxiliary debug during the rollout phase. And I am sure that this error occurred during the actor update phase.

mantle2048 · 2025-07-15T07:15:12Z

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

Yes, this is the first line of the error, and the preceding message is the log output of some auxiliary debug during the rollout phase. And I am sure that this error occurred during the actor update phase.

May I ask what version of flash_attn you are using?

I am using 2.7.4.post1 with PyTorch 2.7.1 and have encountered the exact same issue as you.

However, flash_attn 2.8.0 has a known issue and is unusable.

(See:

#2405

Dao-AILab/flash-attention#1734)

This is preventing the training from proceeding.

~~I am looking for a solution and was wondering if it's possible to disable flash_attention training in verl.~~

Disable use_remove_padding works for me.

Maxwell-Jia · 2025-07-15T07:54:09Z

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

Yes, this is the first line of the error, and the preceding message is the log output of some auxiliary debug during the rollout phase. And I am sure that this error occurred during the actor update phase.

May I ask what version of flash_attn you are using?

I am using 2.7.4.post1 with PyTorch 2.7.1 and have encountered the exact same issue as you.

However, flash_attn 2.8.0 has a known issue and is unusable.

(See:

#2405

Dao-AILab/flash-attention#1734)

This is preventing the training from proceeding.

~~I am looking for a solution and was wondering if it's possible to disable flash_attention training in verl.~~

Disable use_remove_padding works for me.

I am using flash_attn==2.7.4.post1 with torch==2.6.0.
Thank you for the information, I'll also try to disable use_remove_padding.

Zhou-jiecheng · 2025-07-17T14:43:46Z

When I tried to reproduce your code and check the loss curve, I encountered a problem similar to #2445

[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.221884254 ProcessGroupNCCL.cpp:632] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=937, OpType=ALLREDUCE, NumelIn=2, NumelOut=2, Timeout(ms)=1800000) ran for 1800070 milliseconds before timing out.
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.226651768 ProcessGroupNCCL.cpp:2271] [PG ID 0 PG GUID 0(default_pg) Rank 3] failure detected by watchdog at work sequence id: 937 PG status: last enqueued work: 937, last completed work: 936
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.226678315 ProcessGroupNCCL.cpp:670] Stack trace of the failed collective not found, potentially because FlightRecorder is disabled. You can enable it by setting TORCH_NCCL_TRACE_BUFFER_SIZE to a non-zero value.
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.226713372 ProcessGroupNCCL.cpp:2106] [PG ID 0 PG GUID 0(default_pg) Rank 3] First PG on this rank to signal dumping.
[36m(WorkerDict pid=1508038)[0m [rank0]:[E716 02:39:39.038637652 ProcessGroupNCCL.cpp:1685] [PG ID 0 PG GUID 0(default_pg) Rank 0] Observed flight recorder dump signal from another rank via TCPStore.
[36m(WorkerDict pid=1508038)[0m [rank0]:[E716 02:39:39.038723715 ProcessGroupNCCL.cpp:1746] [PG ID 0 PG GUID 0(default_pg) Rank 0] Received a dump signal due to a collective timeout from rank 3 and we will try our best to dump the debug info. Last enqueued NCCL work: 937, last completed NCCL work: 936.This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc.
[36m(WorkerDict pid=1508038)[0m [rank0]:[E716 02:39:39.038825918 ProcessGroupNCCL.cpp:1536] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL preparing to dump debug info. Include stack trace: 1
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:39.038611132 ProcessGroupNCCL.cpp:1746] [PG ID 0 PG GUID 0(default_pg) Rank 3] Received a dump signal due to a collective timeout from this local rank and we will try our best to dump the debug info. Last enqueued NCCL work: 937, last completed NCCL work: 936.This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc.
[36m(WorkerDict pid=1508272)[0m /file_system/zjc/verl_deepeyes/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
[36m(WorkerDict pid=1508272)[0m tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device)
[36m(WorkerDict pid=1508273)[0m [rank5]:[E716 02:39:39.038661476 ProcessGroupNCCL.cpp:1685] [PG ID 0 PG GUID 0(default_pg) Rank 5] Observed flight recorder dump signal from another rank via TCPStore.[32m [repeated 6x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [rank5]:[E716 02:39:39.038751714 ProcessGroupNCCL.cpp:1746] [PG ID 0 PG GUID 0(default_pg) Rank 5] Received a dump signal due to a collective timeout from rank 3 and we will try our best to dump the debug info. Last enqueued NCCL work: 937, last completed NCCL work: 936.This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc. [32m [repeated 6x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [rank5]:[E716 02:39:39.038843747 ProcessGroupNCCL.cpp:1536] [PG ID 0 PG GUID 0(default_pg) Rank 5] ProcessGroupNCCL preparing to dump debug info. Include stack trace: 1[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508038)[0m [rank0]:[F716 02:47:39.039423337 ProcessGroupNCCL.cpp:1557] [PG ID 0 PG GUID 0(default_pg) Rank 0] [PG ID 0 PG GUID 0(default_pg) Rank 0] Terminating the process after attempting to dump debug info, due to collective timeout or exception.
[36m(WorkerDict pid=1508038)[0m *** SIGABRT received at time=1752634059 on cpu 52 ***
[36m(WorkerDict pid=1508038)[0m PC: @ 0x7fd7e32909fc (unknown) pthread_kill
[36m(WorkerDict pid=1508038)[0m @ 0x7fd7e323c520 (unknown) (unknown)
[36m(WorkerDict pid=1508038)[0m [2025-07-16 02:47:39,445 E 1508038 1508898] logging.cc:496: *** SIGABRT received at time=1752634059 on cpu 52 ***
[36m(WorkerDict pid=1508038)[0m [2025-07-16 02:47:39,445 E 1508038 1508898] logging.cc:496: PC: @ 0x7fd7e32909fc (unknown) pthread_kill
[36m(WorkerDict pid=1508038)[0m [2025-07-16 02:47:39,445 E 1508038 1508898] logging.cc:496: @ 0x7fd7e323c520 (unknown) (unknown)
[36m(WorkerDict pid=1508038)[0m Fatal Python error: Aborted
[36m(WorkerDict pid=1508038)[0m
[36m(WorkerDict pid=1508038)[0m
[36m(WorkerDict pid=1508038)[0m Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, uvloop.loop, ray._raylet, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, scipy._lib._ccallback_c, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, pyarrow._json, regex._regex, pybase64._pybase64, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, zmq.backend.cython._zmq, sentencepiece._sentencepiece, msgspec._core, cuda.bindings._lib.utils, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, cuda.bindings._bindings.cynvrtc, cuda.bindings.cynvrtc, cuda.bindings.nvrtc (total: 201)
[36m(WorkerDict pid=1508274)[0m
[36m(WorkerDict pid=1508274)[0m
[36m(WorkerDict pid=1508270)[0m
[36m(WorkerDict pid=1508270)[0m
[36m(WorkerDict pid=1508275)[0m
[36m(WorkerDict pid=1508275)[0m
[36m(WorkerDict pid=1508272)[0m
[36m(WorkerDict pid=1508272)[0m
[36m(WorkerDict pid=1508271)[0m
[36m(WorkerDict pid=1508271)[0m
[36m(WorkerDict pid=1508269)[0m
[36m(WorkerDict pid=1508269)[0m
[36m(WorkerDict pid=1508273)[0m
[36m(WorkerDict pid=1508273)[0m
[33m(raylet)[0m A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffe6820df8da2b90e4a8c69f5301000000 Worker ID: f8abf57db1813e9ac65ac19c8efbb59a6dbe750b416e7b08eb1124e8 Node ID: 09d07a17c937a1df95bae63476615a85a6aeb7428c5ab141e843f738 Worker IP address: 192.168.111.204 Worker port: 10110 Worker PID: 1508038 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Error executing job with overrides: ['data.train_files=[/file_system/datasets/DeepEyes-Datasets-47k/data_0.1.2_visual_toolbox_v2.parquet,/file_system/datasets/DeepEyes-Datasets-47k/data_v0.8_visual_toolbox_v2.parquet,/file_system/datasets/DeepEyes-Datasets-47k/data_thinklite_reasoning_acc.parquet]', 'data.val_files=[/file_system/datasets/DeepEyes-Datasets-47k/data_thinklite_reasoning_acc.parquet]', 'data.train_batch_size=64', 'data.max_prompt_length=8192', 'data.max_response_length=10240', 'data.return_raw_chat=True', 'data.filter_overlong_prompts=True', 'algorithm.adv_estimator=grpo', 'algorithm.kl_ctrl.kl_coef=0.0', 'actor_rollout_ref.model.path=/file_system/common-models/Qwen/Qwen2.5-VL-7B-Instruct', 'actor_rollout_ref.model.use_remove_padding=True', 'actor_rollout_ref.model.use_fused_kernels=True', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.actor.ppo_mini_batch_size=32', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1', 'actor_rollout_ref.actor.use_kl_loss=False', 'actor_rollout_ref.actor.kl_loss_coef=0.0', 'actor_rollout_ref.actor.kl_loss_type=low_var_kl', 'actor_rollout_ref.actor.entropy_coeff=0.0', 'actor_rollout_ref.actor.checkpoint.save_contents=[model,hf_model,optimizer,extra]', 'actor_rollout_ref.actor.ulysses_sequence_parallel_size=1', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1', 'actor_rollout_ref.rollout.tensor_model_parallel_size=1', 'actor_rollout_ref.rollout.name=sglang', 'actor_rollout_ref.rollout.n=16', 'actor_rollout_ref.rollout.max_num_batched_tokens=10240', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.5', 'actor_rollout_ref.rollout.enforce_eager=True', 'actor_rollout_ref.rollout.free_cache_engine=True', 'actor_rollout_ref.rollout.enable_chunked_prefill=True', 'actor_rollout_ref.actor.fsdp_config.param_offload=True', 'actor_rollout_ref.actor.fsdp_config.optimizer_offload=True', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8', 'actor_rollout_ref.ref.fsdp_config.param_offload=True', 'actor_rollout_ref.rollout.multi_turn.enable=True', 'actor_rollout_ref.rollout.multi_turn.max_assistant_turns=5', 'actor_rollout_ref.rollout.multi_turn.max_user_turns=1', 'actor_rollout_ref.rollout.multi_turn.max_parallel_calls=1', 'actor_rollout_ref.rollout.multi_turn.tool_config_path=/file_system/zjc/verl_deepeyes/recipe/deepeyes/configs/image_zoom_in_tool_config.yaml', 'trainer.critic_warmup=0', 'trainer.logger=[console,wandb,tensorboard]', 'trainer.val_before_train=False', 'trainer.n_gpus_per_node=8', 'trainer.nnodes=1', 'trainer.save_freq=8', 'trainer.test_freq=80', 'trainer.project_name=deepeyes', 'trainer.experiment_name=deepeyes_pr', 'trainer.default_local_dir=/file_system/zjc/checkpoints/deepeyes/deepeyes_pr', '+trainer.tensorboard_dir=/file_system/zjc/checkpoints/logs/tensorboard', '+trainer.rl_logging_board_dir=/file_system/zjc/checkpoints/logs/rl_logging_board', 'trainer.total_epochs=1']
Traceback (most recent call last):
File "/file_system/zjc/verl_deepeyes/verl/trainer/main_ppo.py", line 34, in main
run_ppo(config)
File "/file_system/zjc/verl_deepeyes/verl/trainer/main_ppo.py", line 57, in run_ppo
ray.get(runner.run.remote(config))
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ActorDiedError): [36mray::TaskRunner.run()[39m (pid=1504744, ip=192.168.111.204, actor_id=8c574a3f89ef72dda53ae30001000000, repr=<main_ppo.TaskRunner object at 0x7f6d57c29e70>)
File "/file_system/zjc/verl_deepeyes/verl/trainer/main_ppo.py", line 207, in run
trainer.fit()
File "/file_system/zjc/verl_deepeyes/verl/trainer/ppo/ray_trainer.py", line 1135, in fit
gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
File "/file_system/zjc/verl_deepeyes/verl/single_controller/ray/base.py", line 51, in call
output = ray.get(output)
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
class_name: create_colocated_worker_cls..WorkerDict
actor_id: e6820df8da2b90e4a8c69f5301000000
pid: 1508038
name: aeCqIzWorkerDict_0:0
namespace: df42830f-bf7b-4778-9d94-58af2144767a
ip: 192.168.111.204
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[36m(WorkerDict pid=1508271)[0m /file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
[36m(WorkerDict pid=1508271)[0m warnings.warn('resource_tracker: There appear to be %d '
[33m(raylet)[0m A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffac8d4511713fa45aa1b403da01000000 Worker ID: 3f3e01ddc797d3e0f8cfc0e06f69e7ab5978a439049ec28e2a6ce53a Node ID: 09d07a17c937a1df95bae63476615a85a6aeb7428c5ab141e843f738 Worker IP address: 192.168.111.204 Worker port: 10115 Worker PID: 1508271 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
[36m(WorkerDict pid=1508273)[0m [rank5]:[F716 02:47:39.039637164 ProcessGroupNCCL.cpp:1557] [PG ID 0 PG GUID 0(default_pg) Rank 5] [PG ID 0 PG GUID 0(default_pg) Rank 5] Terminating the process after attempting to dump debug info, due to collective timeout or exception.[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m *** SIGABRT received at time=1752634059 on cpu 106 ***[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m PC: @ 0x7f2531a9c9fc (unknown) pthread_kill[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m @ 0x7f2531a48520 (unknown) (unknown)[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [2025-07-16 02:47:39,447 E 1508273 1508901] logging.cc:496: *** SIGABRT received at time=1752634059 on cpu 106 ***[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [2025-07-16 02:47:39,447 E 1508273 1508901] logging.cc:496: PC: @ 0x7f2531a9c9fc (unknown) pthread_kill[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [2025-07-16 02:47:39,447 E 1508273 1508901] logging.cc:496: @ 0x7f2531a48520 (unknown) (unknown)[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m Fatal Python error: Aborted[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, uvloop.loop, ray._raylet, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, scipy._lib._ccallback_c, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, pyarrow._json, regex._regex, pybase64._pybase64, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, zmq.backend.cython._zmq, sentencepiece._sentencepiece, msgspec._core, cuda.bindings._lib.utils, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, cuda.bindings._bindings.cynvrtc, cuda.bindings.cynvrtc, cuda.bindings.nvrtc (total: 201)[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m /file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
[36m(WorkerDict pid=1508273)[0m warnings.warn('resource_tracker: There appear to be %d '

Maxwell-Jia · 2025-07-18T02:43:06Z

@Zhou-jiecheng Reference to #2398 (comment), set actor_rollout_ref.model.use_remove_padding=False. It works for me.

Maxwell-Jia · 2025-07-23T06:20:20Z

@Zhou-jiecheng We found an important bug. I think this is probably the root cause of this inexplicable error.

The problem lies in the communication between our rollout and training processes. We currently convert generated token IDs to text for multi-turn conversation history, and then re-tokenize this combined text in the next turn. This token ID -> text -> token ID process is not reversible.

For example, a first-turn response like <think>In a box... might be tokenized as ['<', 'think', '>', 'In', 'a', 'box...']. In the next turn, after concatenating a tool response, the input becomes <think>In a box...<tool_response>..... The tokenizer might then process this differently, splitting the original special token into ['<th', 'ink>', 'In', '...'].

This token-level inconsistency for the same conversational history creates significant instability during training, leading to the inexplicable errors we've been seeing.

Thanks @xieck13 for finding and feeding back this error.

So far I have implemented multimodal tool calls under AgentLoop and fixed this issue. This issue will be also fixed later under AsyncRequest.

Zhou-jiecheng · 2025-07-23T08:23:29Z

I wonder if you have ignored to process the multimodal information in Interleaved MCoT, the corresponding official code at https://github.com/Visual-Agent/DeepEyes/blob/561293def6dc71fa7ac8b5bc674c070c393c9d94/verl/workers/agent/parallel_env.py#L284. If you have considered it, can you tell me the logic of your processing? Thanks!

Zhou-jiecheng · 2025-07-25T15:15:05Z

Hello, I tried to reproduce your code. But the reward curve is abnormal. There should be some accuracy issues, algorithm mismatch or hyper-parameters mismatch to original version in this PR.

xieck13 · 2025-07-25T15:42:30Z

@Zhou-jiecheng Hello, it is seem to be an issue here. We're working on a fix. Could you share which dataset you're using for this reward curve?

Zhou-jiecheng · 2025-07-25T15:56:16Z

@Zhou-jiecheng Hello, it is seem to be an issue here. We're working on a fix. Could you share which dataset you're using for this reward curve?

Hello, I use data_v0.8_visual_toolbox_v2.parquet 90% for training and 10% for validation.

Maxwell-Jia · 2025-08-22T08:02:06Z

@lzxdjb What is your version of transformers?

lzxdjb · 2025-08-22T08:15:12Z

Thank you so much for your reply!

I am using the latest verl code and the latest docker images provided by verl: verlai/verl:app-verl0.5-sglang0.4.9.post6-mcore0.12.2-te2.2

The transformer in this docker images is: Version: 4.53.2, which is the newest version

Maxwell-Jia · 2025-08-22T08:37:48Z

See huggingface/transformers#39685.

This should be a bug with transformers, and you can try switching versions. Newer versions, such as 4.54.0, or older versions such as 0.52.3, should not have this problem.

lzxdjb · 2025-08-22T09:20:24Z

I use the 4.54.0 and fix the problem. Thank you so much for your patient FAQs！🥰🥰🥰🥰

FloSophoraeX · 2025-08-22T15:06:49Z

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.

The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

xuetf · 2025-08-27T11:56:19Z

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.

The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

FloSophoraeX · 2025-08-27T12:07:17Z

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before.
To fix it:

Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
And run
```
pip uninstall vllm
pip install vllm
```

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

xuetf · 2025-08-28T07:41:05Z

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before. To fix it:

Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
And run
```
pip uninstall vllm
pip install vllm
```

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

could you share it with me? my email address is 476122294@qq.com. Thank you so much

cq-dong · 2025-09-02T12:00:30Z

What does this PR do?

This PR introduces a complete training recipe for DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning.本 PR 介绍了 DeepEyes 的完整训练秘诀：通过强化学习激励“用图像思考”。

The core feature is the support for multi-turn visual tools, specifically the ImageZoomInTool, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance.核心功能是支持多轮视觉工具，特别是 ImageZoomInTool，集成了基于“LLM-as-a-Judge”模式的自定义奖励函数，以评估模型性能。

Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb.此外，为了更好地监控和分析模型的工具使用行为，该 PR 增加了在训练过程中跟踪工具调用计数并将这些指标报告给 wandb 等日志系统的功能。

API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file.主要变化是 DeepEyes 的新训练配方。用户可以使用提供的配置文件开始训练运行。

Preprocess the dataset. We need to add some tool-related extra_info:预处理数据集。我们需要添加一些与工具相关的 extra_info：
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
Start the PPO training:开始 PPO 培训：
bash recipe/deepeyes/run_deepeyes_grpo.sh
The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe.训练过程将自动加载 ImageZoomInTool 和配方中定义的自定义奖励函数。
# Add code snippet or script demonstrating how to use this
Design & Code Changes

DeepEyes Recipe Integration: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes.**DeepEyes 配方集成 **：添加了一个新的配方目录，其中包含数据预处理、工具配置和 DeepEyes 的自定义奖励功能。

Visual Tool Support: Implemented ImageZoomInTool with robust bbox validation and resizing.**可视化工具支持 **：实现了具有强大 bbox 验证和调整大小的 ImageZoomInTool。

Tool Call Statistics: Modified the rollout and metrics code to track and log tool call counts per sample and per step.**工具调用统计信息 **：修改了推出和指标代码，以跟踪和记录每个样本和每个步骤的工具调用计数。

Bug Fixes: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting.**错误修复 **：修复了图像字节处理，并确保在解码过程中保留特殊令牌以进行工具调用格式化。

Checklist Before Submitting

Read the Contribute Guide.阅读 Contribute 指南。

Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always应用提交前检查： pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always

Add / Update the documentation.添加/更新文档。

Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...将单元或端到端测试添加到 CI 工作流以涵盖所有代码。如果不可行，请解释原因：......

Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace.PR 准备好用于 CI 后，在 verl Slack 工作区的 ci-request 通道中发送消息。

Thank you very much for your work, but I can't find: deepeyes47k_preprocess.py
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
How do I preprocess the data?

### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

Maxwell-Jia · 2025-09-07T02:41:07Z

@cq-dong It was updated, and there is no need for preprocessing, just use the original dataset files.

WentaoYan1453 · 2025-09-11T04:02:07Z

Should I set return_multi_modal_inputs to true in recipe/deepeyes/configs/deepeyes_multiturn_grpo.yaml?

xytiann · 2025-09-17T07:55:23Z

Hi, could you explain this comment in recipe/deepeyes/deepeyes.py:
We don't need tool description, because custom_chat_template will add it.

I found that tool description was not added to system prompt by custom_chat_template in val/generations, for example:
system You are a helpful assistant. You can call functions to assist with the user query. Important: You must call only one function at a time. After each function call, wait for the execution result before making the next function call if needed. user Is the red shirt needs shorts to the left or right of the boy on his stomach? Think first, call image_zoom_in_tool if needed, then answer. Format strictly as: ... <tool_call>...</tool_call> (if tools needed) ... assistant

### What does this PR do? Follow verl-project/verl#2398, support vLLM multi-modal.

jumbo-q · 2025-10-07T15:22:24Z

请问大佬这个能否添加多个tool 即可以function call多个工具如果可以的话是在类似 https://github.com/volcengine/verl/blob/main/recipe/deepeyes/configs/image_zoom_in_tool_config.yaml 的yaml里添加多个function call 的 py文件吗

### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

0001Henry · 2025-10-19T11:04:54Z

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before. To fix it:

Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
And run
```
pip uninstall vllm
pip install vllm
```

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

could you share it with me? my email address is 476122294@qq.com. Thank you so much

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before. To fix it:

Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
And run
```
pip uninstall vllm
pip install vllm
```

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

Hello. Could you please share the requirements.txt with me? My email is 1194913898@qq.com. Thank you!

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

Saint-lsy · 2025-11-19T09:32:04Z

Hello! I'd like to know whether the DeepEyes recipe supports the qwen3vl dense models for multi-turn tool-using sampling? When using Qwen3-vl, the vllm needs to be 0.11.0 and torch needs to be 2.8.0 or upper

### What does this PR do? Follow verl-project/verl#2398, support vLLM multi-modal.

### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

Maxwell-Jia requested review from PeterSH6, SwordFaith, chenhaiq, eric-haibin-lin, tongyx361, vermouth1992 and zhaochenyang20 as code owners July 7, 2025 09:08

gemini-code-assist bot reviewed Jul 7, 2025

View reviewed changes

recipe/deepeyes/reward_function.py Outdated Show resolved Hide resolved

recipe/deepeyes/reward_function.py Outdated Show resolved Hide resolved

recipe/deepeyes/reward_function.py Outdated Show resolved Hide resolved

verl/tools/image_zoom_in_tool.py Outdated Show resolved Hide resolved

nanjiangwill mentioned this pull request Jul 7, 2025

Multi-turn Update #2 VLM Support Tracker zhaochenyang20/Awesome-ML-SYS-Tutorial#137

Open

11 tasks

SwordFaith reviewed Jul 8, 2025

View reviewed changes

verl/tools/image_zoom_in_tool.py Outdated Show resolved Hide resolved

whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025

[rollout,vllm] feat: support multi-modal in agent loop (verl-project#…

d015a88

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025

[rollout,vllm] feat: support multi-modal in agent loop (#3016)

e663a94

### What does this PR do? Follow verl-project/verl#2398, support vLLM multi-modal.

Osilly mentioned this pull request Oct 11, 2025

multimodal support rllm-org/rllm#242

Open

nanjiangwill mentioned this pull request Oct 20, 2025

[FSDP, VLM] feat: add vlm training for FSDP THUDM/slime#501

Merged

14 tasks

nancyjlau mentioned this pull request Oct 24, 2025

Enable multimodal model support for RL training rllm-org/rllm#264

Open

techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025

[rollout,vllm] feat: support multi-modal in agent loop (verl-project#…

1eed85c

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025

[rollout,vllm] feat: support multi-modal in agent loop (verl-project#…

c9ae199

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

paolo328 added a commit to paolo328/Verl that referenced this pull request Nov 27, 2025

[rollout,vllm] feat: support multi-modal in agent loop (#3016)

b80e46c

### What does this PR do? Follow verl-project/verl#2398, support vLLM multi-modal.

TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025

[rollout,vllm] feat: support multi-modal in agent loop (verl-project#…

c7e2429

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026

[rollout,vllm] feat: support multi-modal in agent loop (verl-project#…

df811b2

…3016) ### What does this PR do? Follow verl-project#2398, support vLLM multi-modal.

Comments

Conversation

Maxwell-Jia commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vermouth1992 commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Maxwell-Jia commented Jul 7, 2025

Uh oh!

zhaochenyang20 commented Jul 7, 2025

Uh oh!

zhaochenyang20 commented Jul 8, 2025

Uh oh!

Uh oh!

Maxwell-Jia commented Jul 9, 2025

Uh oh!

zhaochenyang20 commented Jul 9, 2025

Uh oh!

Maxwell-Jia commented Jul 13, 2025

Uh oh!

Maxwell-Jia commented Jul 13, 2025

Uh oh!

mantle2048 commented Jul 13, 2025

Uh oh!

Maxwell-Jia commented Jul 13, 2025

Uh oh!

mantle2048 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Maxwell-Jia commented Jul 15, 2025

Uh oh!

Zhou-jiecheng commented Jul 17, 2025

Uh oh!

Maxwell-Jia commented Jul 18, 2025

Uh oh!

Maxwell-Jia commented Jul 23, 2025

Uh oh!

Zhou-jiecheng commented Jul 23, 2025

Uh oh!

Zhou-jiecheng commented Jul 25, 2025

Uh oh!

xieck13 commented Jul 25, 2025

Uh oh!

Zhou-jiecheng commented Jul 25, 2025

Uh oh!

Maxwell-Jia commented Aug 22, 2025

Uh oh!

lzxdjb commented Aug 22, 2025

Uh oh!

Maxwell-Jia commented Aug 22, 2025

Uh oh!

lzxdjb commented Aug 22, 2025

Uh oh!

FloSophoraeX commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuetf commented Aug 27, 2025

Uh oh!

FloSophoraeX commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuetf commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Maxwell-Jia commented Jul 7, 2025 •

edited

Loading

CLAassistant commented Jul 7, 2025 •

edited

Loading

vermouth1992 commented Jul 7, 2025 •

edited

Loading

mantle2048 commented Jul 15, 2025 •

edited

Loading

FloSophoraeX commented Aug 22, 2025 •

edited

Loading

FloSophoraeX commented Aug 27, 2025 •

edited

Loading

xuetf commented Aug 28, 2025 •

edited

Loading

xytiann commented Sep 17, 2025 •

edited

Loading

Saint-lsy commented Nov 19, 2025 •

edited

Loading