Skip to content

Comments

[recipe] feat: add deepeyes recipe#2398

Merged
wuxibin89 merged 41 commits intoverl-project:mainfrom
Maxwell-Jia:recipe/deepeyes
Aug 12, 2025
Merged

[recipe] feat: add deepeyes recipe#2398
wuxibin89 merged 41 commits intoverl-project:mainfrom
Maxwell-Jia:recipe/deepeyes

Conversation

@Maxwell-Jia
Copy link
Contributor

@Maxwell-Jia Maxwell-Jia commented Jul 7, 2025

What does this PR do?

This PR introduces a complete training recipe for DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning.

The core feature is the support for multi-turn visual tools, specifically the ImageZoomInTool, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance.

Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb.

API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file.

  1. Preprocess the dataset. We need to add some tool-related extra_info:
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
  1. Start the PPO training:
bash recipe/deepeyes/run_deepeyes_grpo.sh

The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

  • DeepEyes Recipe Integration: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes.
  • Visual Tool Support: Implemented ImageZoomInTool with robust bbox validation and resizing.
  • Tool Call Statistics: Modified the rollout and metrics code to track and log tool call counts per sample and per step.
  • Bug Fixes: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting.

Checklist Before Submitting

@CLAassistant
Copy link

CLAassistant commented Jul 7, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new recipe for DeepEyes, including a new visual tool ImageZoomInTool, a custom reward function, and associated configurations and preprocessing scripts. The changes also include tracking tool usage metrics.

The review identified several critical issues in the new reward function script, including a bug in API client initialization, unhandled exceptions during API calls, and inconsistent logic for parsing model outputs. Additionally, a high-severity issue was found in the ImageZoomInTool where its implementation violates the interface of its base class, potentially leading to runtime errors. These issues should be addressed to ensure the correctness and robustness of the new recipe.

@vermouth1992
Copy link
Collaborator

vermouth1992 commented Jul 7, 2025

Could you provide a runnable script that trains a model that can improves the performance by using image zooming tools? Thanks!

@Maxwell-Jia
Copy link
Contributor Author

@vermouth1992 Hello, I have provided a bash script in the file recipe/deepeyes/run_deepeyes_grpo.sh. Currently training in my environment, due to constant OOM errors, full logs and evaluation scripts may be available later.

@zhaochenyang20
Copy link
Collaborator

Thanks for contributing. Would you please add one readme like this: https://github.com/volcengine/verl/blob/main/recipe/sppo/README.md

to help community reproduce and check the correctness step by step.

I can get people from SGLang community to reproduce and double-check.

@zhaochenyang20
Copy link
Collaborator

@Maxwell-Jia Could you add my wechat? 18015766633

@Maxwell-Jia
Copy link
Contributor Author

Thanks for contributing. Would you please add one readme like this: https://github.com/volcengine/verl/blob/main/recipe/sppo/README.md

to help community reproduce and check the correctness step by step.

I can get people from SGLang community to reproduce and double-check.

I've added README.md under recipe/deepeyes.

@zhaochenyang20
Copy link
Collaborator

great!

@Maxwell-Jia
Copy link
Contributor Author

I refactored the deepeyes recipe code, removing unnecessary functions, etc.

@Maxwell-Jia
Copy link
Contributor Author

At present, the training process of deepeyes is smooth, but in my scenario, the following problems will occur after training a certain step, resulting in training failure:

�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,359 E 1942698 1962889] logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
�[36m(WorkerDict pid=1942698)�[0m CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
�[36m(WorkerDict pid=1942698)�[0m For debugging consider passing CUDA_LAUNCH_BLOCKING=1
�[36m(WorkerDict pid=1942698)�[0m Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
�[36m(WorkerDict pid=1942698)�[0m frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x144c1096c1b6 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x144c10915a76 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x144c10d5b918 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #3: <unknown function> + 0x20d8e (0x144c10d21d8e in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #4: <unknown function> + 0x22507 (0x144c10d23507 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #5: <unknown function> + 0x2270f (0x144c10d2370f in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
�[36m(WorkerDict pid=1942698)�[0m frame #6: <unknown function> + 0x6417b2 (0x144c03cd07b2 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
�[36m(WorkerDict pid=1942698)�[0m frame #7: <unknown function> + 0x6f30f (0x144c1094d30f in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #8: c10::TensorImpl::~TensorImpl() + 0x21b (0x144c1094633b in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #9: c10::TensorImpl::~TensorImpl() + 0x9 (0x144c109464e9 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so)
�[36m(WorkerDict pid=1942698)�[0m frame #10: std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() + 0x88 (0x144c03cd25a8 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
�[36m(WorkerDict pid=1942698)�[0m frame #11: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x129 (0x144bf3ea1d19 in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
�[36m(WorkerDict pid=1942698)�[0m frame #12: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x13f (0x144bf3e99c2f in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
�[36m(WorkerDict pid=1942698)�[0m frame #13: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x5c (0x144c03f643ac in /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
�[36m(WorkerDict pid=1942698)�[0m frame #14: <unknown function> + 0xdbbf4 (0x147b71a2bbf4 in /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6)
�[36m(WorkerDict pid=1942698)�[0m frame #15: <unknown function> + 0x94ac3 (0x147b740b9ac3 in /lib/x86_64-linux-gnu/libc.so.6)
�[36m(WorkerDict pid=1942698)�[0m frame #16: <unknown function> + 0x126850 (0x147b7414b850 in /lib/x86_64-linux-gnu/libc.so.6)
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,442 E 1942698 1962889] logging.cc:119: Stack trace: 
�[36m(WorkerDict pid=1942698)�[0m  /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/ray/_raylet.so(+0x1464c1a) [0x147b72fdec1a] ray::operator<<()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/ray/_raylet.so(+0x14681f2) [0x147b72fe21f2] ray::TerminateHandler()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(+0xb135a) [0x147b71a0135a] __cxxabiv1::__terminate()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(+0xb03b9) [0x147b71a003b9]
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(__gxx_personality_v0+0x87) [0x147b71a00ae7] __gxx_personality_v0
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libgcc_s.so.1(+0x111e4) [0x147b719471e4] _Unwind_RaiseException_Phase2
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libgcc_s.so.1(_Unwind_Resume+0x12e) [0x147b71947c1e] _Unwind_Resume
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so(+0x143f7) [0x144c10d153f7] c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::free()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10_cuda.so(+0x2270f) [0x144c10d2370f] c10::cuda::CUDACachingAllocator::Native::local_raw_delete()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x6417b2) [0x144c03cd07b2] c10::StorageImpl::~StorageImpl()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so(+0x6f30f) [0x144c1094d30f] c10::intrusive_ptr<>::reset_()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so(_ZN3c1010TensorImplD1Ev+0x21b) [0x144c1094633b] c10::TensorImpl::~TensorImpl()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libc10.so(_ZN3c1010TensorImplD0Ev+0x9) [0x144c109464e9] c10::TensorImpl::~TensorImpl()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so(_ZNSt6vectorIN2at6TensorESaIS1_EED2Ev+0x88) [0x144c03cd25a8] std::vector<>::~vector()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE+0x129) [0x144bf3ea1d19] torch::autograd::Engine::thread_main()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb+0x13f) [0x144bf3e99c2f] torch::autograd::Engine::thread_init()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/lib/python3.10/site-packages/torch/lib/libtorch_python.so(_ZN5torch8autograd6python12PythonEngine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb+0x5c) [0x144c03f643ac] torch::autograd::python::PythonEngine::thread_init()
�[36m(WorkerDict pid=1942698)�[0m /data/home/zdhs0094/.conda/envs/agentic-rl/bin/../lib/libstdc++.so.6(+0xdbbf4) [0x147b71a2bbf4] execute_native_thread_routine
�[36m(WorkerDict pid=1942698)�[0m /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x147b740b9ac3]
�[36m(WorkerDict pid=1942698)�[0m /lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x147b7414b850]
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m *** SIGABRT received at time=1752336922 on cpu 74 ***
�[36m(WorkerDict pid=1942698)�[0m PC: @     0x147b740bb9fc  (unknown)  pthread_kill
�[36m(WorkerDict pid=1942698)�[0m     @     0x147b74067520  (unknown)  (unknown)
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,443 E 1942698 1962889] logging.cc:496: *** SIGABRT received at time=1752336922 on cpu 74 ***
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,443 E 1942698 1962889] logging.cc:496: PC: @     0x147b740bb9fc  (unknown)  pthread_kill
�[36m(WorkerDict pid=1942698)�[0m [2025-07-13 00:15:22,443 E 1942698 1962889] logging.cc:496:     @     0x147b74067520  (unknown)  (unknown)
�[36m(WorkerDict pid=1942698)�[0m Fatal Python error: Aborted
�[36m(WorkerDict pid=1942698)�[0m 
�[36m(WorkerDict pid=1942698)�[0m Stack (most recent call first):
�[36m(WorkerDict pid=1942698)�[0m   <no Python frame>

I suspect that there is a bug in verl async-multi-turn training or there is a problem with the configuration of my local environment. I don't know if other people have this problem.

@mantle2048
Copy link

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

@Maxwell-Jia
Copy link
Contributor Author

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

Yes, this is the first line of the error, and the preceding message is the log output of some auxiliary debug during the rollout phase. And I am sure that this error occurred during the actor update phase.

@mantle2048
Copy link

mantle2048 commented Jul 15, 2025

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

Yes, this is the first line of the error, and the preceding message is the log output of some auxiliary debug during the rollout phase. And I am sure that this error occurred during the actor update phase.

May I ask what version of flash_attn you are using?

I am using 2.7.4.post1 with PyTorch 2.7.1 and have encountered the exact same issue as you.

However, flash_attn 2.8.0 has a known issue and is unusable.

(See:

#2405

Dao-AILab/flash-attention#1734)

This is preventing the training from proceeding.

I am looking for a solution and was wondering if it's possible to disable flash_attention training in verl.

Disable use_remove_padding works for me.

@Maxwell-Jia
Copy link
Contributor Author

logging.cc:112: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered

Hi, is this the first line of the error? From what I recall, the root cause of this type of error is usually found in the preceding error messages.

Yes, this is the first line of the error, and the preceding message is the log output of some auxiliary debug during the rollout phase. And I am sure that this error occurred during the actor update phase.

May I ask what version of flash_attn you are using?

I am using 2.7.4.post1 with PyTorch 2.7.1 and have encountered the exact same issue as you.

However, flash_attn 2.8.0 has a known issue and is unusable.

(See:

#2405

Dao-AILab/flash-attention#1734)

This is preventing the training from proceeding.

I am looking for a solution and was wondering if it's possible to disable flash_attention training in verl.

Disable use_remove_padding works for me.

I am using flash_attn==2.7.4.post1 with torch==2.6.0.
Thank you for the information, I'll also try to disable use_remove_padding.

@Zhou-jiecheng
Copy link

When I tried to reproduce your code and check the loss curve, I encountered a problem similar to #2445

[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.221884254 ProcessGroupNCCL.cpp:632] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=937, OpType=ALLREDUCE, NumelIn=2, NumelOut=2, Timeout(ms)=1800000) ran for 1800070 milliseconds before timing out.
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.226651768 ProcessGroupNCCL.cpp:2271] [PG ID 0 PG GUID 0(default_pg) Rank 3] failure detected by watchdog at work sequence id: 937 PG status: last enqueued work: 937, last completed work: 936
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.226678315 ProcessGroupNCCL.cpp:670] Stack trace of the failed collective not found, potentially because FlightRecorder is disabled. You can enable it by setting TORCH_NCCL_TRACE_BUFFER_SIZE to a non-zero value.
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:38.226713372 ProcessGroupNCCL.cpp:2106] [PG ID 0 PG GUID 0(default_pg) Rank 3] First PG on this rank to signal dumping.
[36m(WorkerDict pid=1508038)[0m [rank0]:[E716 02:39:39.038637652 ProcessGroupNCCL.cpp:1685] [PG ID 0 PG GUID 0(default_pg) Rank 0] Observed flight recorder dump signal from another rank via TCPStore.
[36m(WorkerDict pid=1508038)[0m [rank0]:[E716 02:39:39.038723715 ProcessGroupNCCL.cpp:1746] [PG ID 0 PG GUID 0(default_pg) Rank 0] Received a dump signal due to a collective timeout from rank 3 and we will try our best to dump the debug info. Last enqueued NCCL work: 937, last completed NCCL work: 936.This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc.
[36m(WorkerDict pid=1508038)[0m [rank0]:[E716 02:39:39.038825918 ProcessGroupNCCL.cpp:1536] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL preparing to dump debug info. Include stack trace: 1
[36m(WorkerDict pid=1508271)[0m [rank3]:[E716 02:39:39.038611132 ProcessGroupNCCL.cpp:1746] [PG ID 0 PG GUID 0(default_pg) Rank 3] Received a dump signal due to a collective timeout from this local rank and we will try our best to dump the debug info. Last enqueued NCCL work: 937, last completed NCCL work: 936.This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc.
[36m(WorkerDict pid=1508272)[0m /file_system/zjc/verl_deepeyes/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
[36m(WorkerDict pid=1508272)[0m tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device)
[36m(WorkerDict pid=1508273)[0m [rank5]:[E716 02:39:39.038661476 ProcessGroupNCCL.cpp:1685] [PG ID 0 PG GUID 0(default_pg) Rank 5] Observed flight recorder dump signal from another rank via TCPStore.[32m [repeated 6x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [rank5]:[E716 02:39:39.038751714 ProcessGroupNCCL.cpp:1746] [PG ID 0 PG GUID 0(default_pg) Rank 5] Received a dump signal due to a collective timeout from rank 3 and we will try our best to dump the debug info. Last enqueued NCCL work: 937, last completed NCCL work: 936.This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc. [32m [repeated 6x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [rank5]:[E716 02:39:39.038843747 ProcessGroupNCCL.cpp:1536] [PG ID 0 PG GUID 0(default_pg) Rank 5] ProcessGroupNCCL preparing to dump debug info. Include stack trace: 1[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508038)[0m [rank0]:[F716 02:47:39.039423337 ProcessGroupNCCL.cpp:1557] [PG ID 0 PG GUID 0(default_pg) Rank 0] [PG ID 0 PG GUID 0(default_pg) Rank 0] Terminating the process after attempting to dump debug info, due to collective timeout or exception.
[36m(WorkerDict pid=1508038)[0m *** SIGABRT received at time=1752634059 on cpu 52 ***
[36m(WorkerDict pid=1508038)[0m PC: @ 0x7fd7e32909fc (unknown) pthread_kill
[36m(WorkerDict pid=1508038)[0m @ 0x7fd7e323c520 (unknown) (unknown)
[36m(WorkerDict pid=1508038)[0m [2025-07-16 02:47:39,445 E 1508038 1508898] logging.cc:496: *** SIGABRT received at time=1752634059 on cpu 52 ***
[36m(WorkerDict pid=1508038)[0m [2025-07-16 02:47:39,445 E 1508038 1508898] logging.cc:496: PC: @ 0x7fd7e32909fc (unknown) pthread_kill
[36m(WorkerDict pid=1508038)[0m [2025-07-16 02:47:39,445 E 1508038 1508898] logging.cc:496: @ 0x7fd7e323c520 (unknown) (unknown)
[36m(WorkerDict pid=1508038)[0m Fatal Python error: Aborted
[36m(WorkerDict pid=1508038)[0m
[36m(WorkerDict pid=1508038)[0m
[36m(WorkerDict pid=1508038)[0m Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, uvloop.loop, ray._raylet, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, scipy._lib._ccallback_c, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, pyarrow._json, regex._regex, pybase64._pybase64, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, zmq.backend.cython._zmq, sentencepiece._sentencepiece, msgspec._core, cuda.bindings._lib.utils, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, cuda.bindings._bindings.cynvrtc, cuda.bindings.cynvrtc, cuda.bindings.nvrtc (total: 201)
[36m(WorkerDict pid=1508274)[0m
[36m(WorkerDict pid=1508274)[0m
[36m(WorkerDict pid=1508270)[0m
[36m(WorkerDict pid=1508270)[0m
[36m(WorkerDict pid=1508275)[0m
[36m(WorkerDict pid=1508275)[0m
[36m(WorkerDict pid=1508272)[0m
[36m(WorkerDict pid=1508272)[0m
[36m(WorkerDict pid=1508271)[0m
[36m(WorkerDict pid=1508271)[0m
[36m(WorkerDict pid=1508269)[0m
[36m(WorkerDict pid=1508269)[0m
[36m(WorkerDict pid=1508273)[0m
[36m(WorkerDict pid=1508273)[0m
[33m(raylet)[0m A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffe6820df8da2b90e4a8c69f5301000000 Worker ID: f8abf57db1813e9ac65ac19c8efbb59a6dbe750b416e7b08eb1124e8 Node ID: 09d07a17c937a1df95bae63476615a85a6aeb7428c5ab141e843f738 Worker IP address: 192.168.111.204 Worker port: 10110 Worker PID: 1508038 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Error executing job with overrides: ['data.train_files=[/file_system/datasets/DeepEyes-Datasets-47k/data_0.1.2_visual_toolbox_v2.parquet,/file_system/datasets/DeepEyes-Datasets-47k/data_v0.8_visual_toolbox_v2.parquet,/file_system/datasets/DeepEyes-Datasets-47k/data_thinklite_reasoning_acc.parquet]', 'data.val_files=[/file_system/datasets/DeepEyes-Datasets-47k/data_thinklite_reasoning_acc.parquet]', 'data.train_batch_size=64', 'data.max_prompt_length=8192', 'data.max_response_length=10240', 'data.return_raw_chat=True', 'data.filter_overlong_prompts=True', 'algorithm.adv_estimator=grpo', 'algorithm.kl_ctrl.kl_coef=0.0', 'actor_rollout_ref.model.path=/file_system/common-models/Qwen/Qwen2.5-VL-7B-Instruct', 'actor_rollout_ref.model.use_remove_padding=True', 'actor_rollout_ref.model.use_fused_kernels=True', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.actor.ppo_mini_batch_size=32', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1', 'actor_rollout_ref.actor.use_kl_loss=False', 'actor_rollout_ref.actor.kl_loss_coef=0.0', 'actor_rollout_ref.actor.kl_loss_type=low_var_kl', 'actor_rollout_ref.actor.entropy_coeff=0.0', 'actor_rollout_ref.actor.checkpoint.save_contents=[model,hf_model,optimizer,extra]', 'actor_rollout_ref.actor.ulysses_sequence_parallel_size=1', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1', 'actor_rollout_ref.rollout.tensor_model_parallel_size=1', 'actor_rollout_ref.rollout.name=sglang', 'actor_rollout_ref.rollout.n=16', 'actor_rollout_ref.rollout.max_num_batched_tokens=10240', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.5', 'actor_rollout_ref.rollout.enforce_eager=True', 'actor_rollout_ref.rollout.free_cache_engine=True', 'actor_rollout_ref.rollout.enable_chunked_prefill=True', 'actor_rollout_ref.actor.fsdp_config.param_offload=True', 'actor_rollout_ref.actor.fsdp_config.optimizer_offload=True', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8', 'actor_rollout_ref.ref.fsdp_config.param_offload=True', 'actor_rollout_ref.rollout.multi_turn.enable=True', 'actor_rollout_ref.rollout.multi_turn.max_assistant_turns=5', 'actor_rollout_ref.rollout.multi_turn.max_user_turns=1', 'actor_rollout_ref.rollout.multi_turn.max_parallel_calls=1', 'actor_rollout_ref.rollout.multi_turn.tool_config_path=/file_system/zjc/verl_deepeyes/recipe/deepeyes/configs/image_zoom_in_tool_config.yaml', 'trainer.critic_warmup=0', 'trainer.logger=[console,wandb,tensorboard]', 'trainer.val_before_train=False', 'trainer.n_gpus_per_node=8', 'trainer.nnodes=1', 'trainer.save_freq=8', 'trainer.test_freq=80', 'trainer.project_name=deepeyes', 'trainer.experiment_name=deepeyes_pr', 'trainer.default_local_dir=/file_system/zjc/checkpoints/deepeyes/deepeyes_pr', '+trainer.tensorboard_dir=/file_system/zjc/checkpoints/logs/tensorboard', '+trainer.rl_logging_board_dir=/file_system/zjc/checkpoints/logs/rl_logging_board', 'trainer.total_epochs=1']
Traceback (most recent call last):
File "/file_system/zjc/verl_deepeyes/verl/trainer/main_ppo.py", line 34, in main
run_ppo(config)
File "/file_system/zjc/verl_deepeyes/verl/trainer/main_ppo.py", line 57, in run_ppo
ray.get(runner.run.remote(config))
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ActorDiedError): [36mray::TaskRunner.run()[39m (pid=1504744, ip=192.168.111.204, actor_id=8c574a3f89ef72dda53ae30001000000, repr=<main_ppo.TaskRunner object at 0x7f6d57c29e70>)
File "/file_system/zjc/verl_deepeyes/verl/trainer/main_ppo.py", line 207, in run
trainer.fit()
File "/file_system/zjc/verl_deepeyes/verl/trainer/ppo/ray_trainer.py", line 1135, in fit
gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
File "/file_system/zjc/verl_deepeyes/verl/single_controller/ray/base.py", line 51, in call
output = ray.get(output)
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
class_name: create_colocated_worker_cls..WorkerDict
actor_id: e6820df8da2b90e4a8c69f5301000000
pid: 1508038
name: aeCqIzWorkerDict_0:0
namespace: df42830f-bf7b-4778-9d94-58af2144767a
ip: 192.168.111.204
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[36m(WorkerDict pid=1508271)[0m /file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
[36m(WorkerDict pid=1508271)[0m warnings.warn('resource_tracker: There appear to be %d '
[33m(raylet)[0m A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffac8d4511713fa45aa1b403da01000000 Worker ID: 3f3e01ddc797d3e0f8cfc0e06f69e7ab5978a439049ec28e2a6ce53a Node ID: 09d07a17c937a1df95bae63476615a85a6aeb7428c5ab141e843f738 Worker IP address: 192.168.111.204 Worker port: 10115 Worker PID: 1508271 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
[36m(WorkerDict pid=1508273)[0m [rank5]:[F716 02:47:39.039637164 ProcessGroupNCCL.cpp:1557] [PG ID 0 PG GUID 0(default_pg) Rank 5] [PG ID 0 PG GUID 0(default_pg) Rank 5] Terminating the process after attempting to dump debug info, due to collective timeout or exception.[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m *** SIGABRT received at time=1752634059 on cpu 106 ***[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m PC: @ 0x7f2531a9c9fc (unknown) pthread_kill[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m @ 0x7f2531a48520 (unknown) (unknown)[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [2025-07-16 02:47:39,447 E 1508273 1508901] logging.cc:496: *** SIGABRT received at time=1752634059 on cpu 106 ***[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [2025-07-16 02:47:39,447 E 1508273 1508901] logging.cc:496: PC: @ 0x7f2531a9c9fc (unknown) pthread_kill[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m [2025-07-16 02:47:39,447 E 1508273 1508901] logging.cc:496: @ 0x7f2531a48520 (unknown) (unknown)[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m Fatal Python error: Aborted[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, uvloop.loop, ray._raylet, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, scipy._lib._ccallback_c, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, pyarrow._json, regex._regex, pybase64._pybase64, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, zmq.backend.cython._zmq, sentencepiece._sentencepiece, msgspec._core, cuda.bindings._lib.utils, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, cuda.bindings._bindings.cynvrtc, cuda.bindings.cynvrtc, cuda.bindings.nvrtc (total: 201)[32m [repeated 7x across cluster][0m
[36m(WorkerDict pid=1508273)[0m /file_system/zjc/miniconda/envs/verl-deepeyes/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
[36m(WorkerDict pid=1508273)[0m warnings.warn('resource_tracker: There appear to be %d '

@Maxwell-Jia
Copy link
Contributor Author

@Zhou-jiecheng Reference to #2398 (comment), set actor_rollout_ref.model.use_remove_padding=False. It works for me.

@Maxwell-Jia
Copy link
Contributor Author

@Zhou-jiecheng We found an important bug. I think this is probably the root cause of this inexplicable error.

The problem lies in the communication between our rollout and training processes. We currently convert generated token IDs to text for multi-turn conversation history, and then re-tokenize this combined text in the next turn. This token ID -> text -> token ID process is not reversible.

For example, a first-turn response like <think>In a box... might be tokenized as ['<', 'think', '>', 'In', 'a', 'box...']. In the next turn, after concatenating a tool response, the input becomes <think>In a box...<tool_response>..... The tokenizer might then process this differently, splitting the original special token into ['<th', 'ink>', 'In', '...'].

This token-level inconsistency for the same conversational history creates significant instability during training, leading to the inexplicable errors we've been seeing.

Thanks @xieck13 for finding and feeding back this error.

So far I have implemented multimodal tool calls under AgentLoop and fixed this issue. This issue will be also fixed later under AsyncRequest.

@Zhou-jiecheng
Copy link

I wonder if you have ignored to process the multimodal information in Interleaved MCoT, the corresponding official code at https://github.com/Visual-Agent/DeepEyes/blob/561293def6dc71fa7ac8b5bc674c070c393c9d94/verl/workers/agent/parallel_env.py#L284. If you have considered it, can you tell me the logic of your processing? Thanks!

@Zhou-jiecheng
Copy link

Hello, I tried to reproduce your code. But the reward curve is abnormal. There should be some accuracy issues, algorithm mismatch or hyper-parameters mismatch to original version in this PR.
image

@xieck13
Copy link
Contributor

xieck13 commented Jul 25, 2025

@Zhou-jiecheng Hello, it is seem to be an issue here. We're working on a fix. Could you share which dataset you're using for this reward curve?

@Zhou-jiecheng
Copy link

@Zhou-jiecheng Hello, it is seem to be an issue here. We're working on a fix. Could you share which dataset you're using for this reward curve?

Hello, I use data_v0.8_visual_toolbox_v2.parquet 90% for training and 10% for validation.

@Maxwell-Jia
Copy link
Contributor Author

@lzxdjb What is your version of transformers?

@lzxdjb
Copy link

lzxdjb commented Aug 22, 2025

Thank you so much for your reply!

I am using the latest verl code and the latest docker images provided by verl: verlai/verl:app-verl0.5-sglang0.4.9.post6-mcore0.12.2-te2.2

The transformer in this docker images is: Version: 4.53.2, which is the newest version

@Maxwell-Jia
Copy link
Contributor Author

See huggingface/transformers#39685.

This should be a bug with transformers, and you can try switching versions. Newer versions, such as 4.54.0, or older versions such as 0.52.3, should not have this problem.

@lzxdjb
Copy link

lzxdjb commented Aug 22, 2025

I use the 4.54.0 and fix the problem. Thank you so much for your patient FAQs!🥰🥰🥰🥰

@FloSophoraeX
Copy link

FloSophoraeX commented Aug 22, 2025

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.

The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

@xuetf
Copy link

xuetf commented Aug 27, 2025

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.

The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

@FloSophoraeX
Copy link

FloSophoraeX commented Aug 27, 2025

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before.
To fix it:

  1. Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
  2. And run
    pip uninstall vllm
    pip install vllm

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

@xuetf
Copy link

xuetf commented Aug 28, 2025

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before. To fix it:

  1. Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
  2. And run
    pip uninstall vllm
    pip install vllm

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

could you share it with me? my email address is 476122294@qq.com. Thank you so much

@cq-dong
Copy link

cq-dong commented Sep 2, 2025

What does this PR do?

This PR introduces a complete training recipe for DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning.本 PR 介绍了 DeepEyes 的完整训练秘诀 :通过强化学习激励“用图像思考”

The core feature is the support for multi-turn visual tools, specifically the ImageZoomInTool, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance.核心功能是支持多轮视觉工具,特别是 ImageZoomInTool,集成了基于“LLM-as-a-Judge”模式的自定义奖励函数,以评估模型性能。

Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb.此外,为了更好地监控和分析模型的工具使用行为,该 PR 增加了在训练过程中跟踪工具调用计数并将这些指标报告给 wandb 等日志系统的功能。

API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file.主要变化是 DeepEyes 的新训练配方。用户可以使用提供的配置文件开始训练运行。

  1. Preprocess the dataset. We need to add some tool-related extra_info:预处理数据集。我们需要添加一些与工具相关的 extra_info:
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
  1. Start the PPO training:开始 PPO 培训:
bash recipe/deepeyes/run_deepeyes_grpo.sh

The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe.训练过程将自动加载 ImageZoomInTool 和配方中定义的自定义奖励函数。

# Add code snippet or script demonstrating how to use this

Design & Code Changes

  • DeepEyes Recipe Integration: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes.**DeepEyes 配方集成 **:添加了一个新的配方目录,其中包含数据预处理、工具配置和 DeepEyes 的自定义奖励功能。
  • Visual Tool Support: Implemented ImageZoomInTool with robust bbox validation and resizing.**可视化工具支持 **:实现了具有强大 bbox 验证和调整大小的 ImageZoomInTool
  • Tool Call Statistics: Modified the rollout and metrics code to track and log tool call counts per sample and per step.**工具调用统计信息 **:修改了推出和指标代码,以跟踪和记录每个样本和每个步骤的工具调用计数。
  • Bug Fixes: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting.**错误修复 **:修复了图像字节处理,并确保在解码过程中保留特殊令牌以进行工具调用格式化。

Checklist Before Submitting

Thank you very much for your work, but I can't find: deepeyes47k_preprocess.py
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
How do I preprocess the data?

whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
### What does this PR do?

This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).

The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.

Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.

### API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.

1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
@Maxwell-Jia
Copy link
Contributor Author

@cq-dong It was updated, and there is no need for preprocessing, just use the original dataset files.

@WentaoYan1453
Copy link

Should I set return_multi_modal_inputs to true in recipe/deepeyes/configs/deepeyes_multiturn_grpo.yaml?

@xytiann
Copy link

xytiann commented Sep 17, 2025

Hi, could you explain this comment in recipe/deepeyes/deepeyes.py:
We don't need tool description, because custom_chat_template will add it.

I found that tool description was not added to system prompt by custom_chat_template in val/generations, for example:
system You are a helpful assistant. You can call functions to assist with the user query. Important: You must call only one function at a time. After each function call, wait for the execution result before making the next function call if needed. user Is the red shirt needs shorts to the left or right of the boy on his stomach? Think first, call image_zoom_in_tool if needed, then answer. Format strictly as: ... <tool_call>...</tool_call> (if tools needed) ... assistant

VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025
### What does this PR do?

Follow verl-project/verl#2398, support vLLM
multi-modal.
@jumbo-q
Copy link

jumbo-q commented Oct 7, 2025

请问大佬 这个能否添加多个tool 即可以function call多个工具 如果可以的话是在类似 https://github.com/volcengine/verl/blob/main/recipe/deepeyes/configs/image_zoom_in_tool_config.yaml 的yaml里添加多个function call 的 py文件吗

WncFht pushed a commit to WncFht/verl that referenced this pull request Oct 10, 2025
### What does this PR do?

This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).

The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.

Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.

### API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.

1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
@0001Henry
Copy link

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before. To fix it:

  1. Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
  2. And run
    pip uninstall vllm
    pip install vllm

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

could you share it with me? my email address is 476122294@qq.com. Thank you so much

 File "/train-venv/lib/python3.10/site-packages/sgl_kernel/__init__.py", line 13, in <module>
(TaskRunner pid=4985)     from sgl_kernel import common_ops
(TaskRunner pid=4985) ImportError: /train-venv/lib/python3.10/site-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

May I ask what the required version is for running the training? I encountered the problem shown above during execution, and it seems to be a version mismatch.
The versions I am using are as follows:

sgl-kernel                               0.3.6.post1
sglang                                   0.4.9.post6
flash_attn                               2.7.4.post1
flashinfer-python                        0.2.2.post1+cu124torch2.6
torch                                    2.6.0
torch_memory_saver                       0.0.8
torchao                                  0.12.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
hf_transfer                              0.1.9
transformers                             4.51.1
cuda-bindings                            13.0.1
cuda-pathfinder                          1.1.0
cuda-python                              13.0.1
cupy-cuda12x                             13.6.0
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127

have you resolved the problem? i encounter the same problem

Yes, I’ve solved the issue. After installing verl, run

pip install "sglang[all]==0.4.10.post2"

This will upgrade PyTorch to 2.7.1, which will break both flash-attn and vLLM you installed before. To fix it:

  1. Re-install the correct flash-attn wheel (matching CUDA & PyTorch 2.7.1).
  2. And run
    pip uninstall vllm
    pip install vllm

Here are the versions of my key libraries:

flash_attn @ file://flash_attn-2.8.3%2Bcu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
flashinfer-python==0.2.9rc2
torch==2.7.1
sgl-kernel==0.2.8
sglang==0.4.10.post2
torch_memory_saver==0.0.8
torchao==0.9.0
torchaudio==2.7.1
torchdata==0.11.0
torchvision==0.22.1
xformers==0.0.31
xgrammar==0.1.21
vllm==0.10.1.1
transformers==4.55.4

If you hit any problems, let me know—I can share my whole working requirements.txt to you.

Hello. Could you please share the requirements.txt with me? My email is 1194913898@qq.com. Thank you!

techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
### What does this PR do?

This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).

The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.

Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.

### API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.

1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
@Saint-lsy
Copy link

Saint-lsy commented Nov 19, 2025

Hello! I'd like to know whether the DeepEyes recipe supports the qwen3vl dense models for multi-turn tool-using sampling? When using Qwen3-vl, the vllm needs to be 0.11.0 and torch needs to be 2.8.0 or upper

paolo328 added a commit to paolo328/Verl that referenced this pull request Nov 27, 2025
### What does this PR do?

Follow verl-project/verl#2398, support vLLM
multi-modal.
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
### What does this PR do?

This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).

The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.

Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.

### API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.

1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
### What does this PR do?

This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).

The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.

Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.

### API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.

1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.