[ray]{feat}: 1) add multithread execution mode and global thread pool management; 2) add launch_reward_fn_sub_thread by chenchaoxu7575 · Pull Request #2861 · verl-project/verl

chenchaoxu7575 · 2025-08-01T07:58:16Z

What does this PR do?

We found that ray.remote() takes long time to submit tasks withlarge batch. We can use a thread pool to run these ray task submissions in the single controller. In an experiment setting of vllm, grpo, geo3k, 8 H100, we can speed up RL training: 1) reward_fn: 5s -> <1s; 2) execute worker (x3): 18s -> 12s.
Here are some experiments of this feature.

1 REWARD FUNCTION

The reward_compute block is hidden in the timeline by using launch_reward_fn_sub_thread=True.

1.1 launch_reward_fn_async = True (default)

1.2 launch_reward_fn_sub_thread=True (new)

2. EXECUTE WORKERS

Start time of worker in each rank are different. Workers do not run util the slowest rank starts. Multithread execute can help reduce this overhead cost from 18s to 12s.

2.1 execute_all (default)

2.2 trainer.execute_mode=all_multithread (new)

main changes

Added ALL_MULTITHREAD to the Execute enum.
Updated get_predefined_execute_fn to return the appropriate function for the new execution mode.
Implemented execute_all_multithread_submit in RayWorkerGroup for parallel method execution.
Introduced GlobalThreadPoolManager to manage a shared thread pool across the application.
Updated configuration files to include options for the new execution mode and thread pool size.
Added launch_reward_fn_sub_thread configuration option for asynchronous reward computation.
Implemented _async_compute_reward_wrapper method in RayPPOTrainer for thread-safe reward computation.
Added conditional thread pool initialization based on launch_reward_fn_sub_thread setting.
Enhanced reward computation flow to support three modes:
- launch_reward_fn_sub_thread: True - Use local thread pool for async computation
- launch_reward_fn_async: True - Use Ray remote function for async computation
- Both False - Synchronous computation
Added mutual exclusivity check between launch_reward_fn_sub_thread and launch_reward_fn_async.
Implemented proper thread pool cleanup in training completion.
Added execute_mode and execute_thread_pool_size options in ppo_trainer.yaml for user configuration.
Added launch_reward_fn_sub_thread: False in reward_model.yaml for reward computation mode control.
Renamed thread_pool_size to execute_thread_pool_size for better clarity.
Thread pool is only initialized when launch_reward_fn_sub_thread is enabled, reducing resource usage.
Reward computation wrapper ensures proper device handling in sub-threads.
Safe thread pool shutdown with null checks to prevent errors.
Maintains backward compatibility with existing reward computation methods.

This enhancement aims to improve performance and flexibility in distributed execution scenarios, providing both parallel worker group execution and asynchronous reward computation capabilities.

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: [recipe] feat: asynchronous reward agent with mini-batch pipeline and one-step off-policy training #2854
Format the PR title as [{modules}] {type}: {description}

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

…management; 2) add launch_reward_fn_sub_thread - Added `ALL_MULTITHREAD` to the `Execute` enum. - Updated `get_predefined_execute_fn` to return the appropriate function for the new execution mode. - Implemented `execute_all_multithread_submit` in `RayWorkerGroup` for parallel method execution. - Introduced `GlobalThreadPoolManager` to manage a shared thread pool across the application. - Updated configuration files to include options for the new execution mode and thread pool size. - Added `launch_reward_fn_sub_thread` configuration option for asynchronous reward computation. - Implemented `_async_compute_reward_wrapper` method in `RayPPOTrainer` for thread-safe reward computation. - Added conditional thread pool initialization based on `launch_reward_fn_sub_thread` setting. - Enhanced reward computation flow to support three modes: - `launch_reward_fn_sub_thread: True` - Use local thread pool for async computation - `launch_reward_fn_async: True` - Use Ray remote function for async computation - Both False - Synchronous computation - Added mutual exclusivity check between `launch_reward_fn_sub_thread` and `launch_reward_fn_async`. - Implemented proper thread pool cleanup in training completion. - Added `execute_mode` and `execute_thread_pool_size` options in `ppo_trainer.yaml` for user configuration. - Added `launch_reward_fn_sub_thread: False` in `reward_model.yaml` for reward computation mode control. - Renamed `thread_pool_size` to `execute_thread_pool_size` for better clarity. - Thread pool is only initialized when `launch_reward_fn_sub_thread` is enabled, reducing resource usage. - Reward computation wrapper ensures proper device handling in sub-threads. - Safe thread pool shutdown with null checks to prevent errors. - Maintains backward compatibility with existing reward computation methods. This enhancement aims to improve performance and flexibility in distributed execution scenarios, providing both parallel worker group execution and asynchronous reward computation capabilities.

CLAassistant · 2025-08-01T07:58:26Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces a multithreaded execution mode and asynchronous reward computation, which are great for improving performance. My review focuses on improving maintainability, correctness, and robustness. Key suggestions include refactoring duplicated code, fixing a critical bug in a fallback mechanism, correcting a faulty conditional check, and improving exception handling to avoid masking errors. Addressing these points will make the new features more reliable and easier to maintain.

verl/single_controller/ray/base.py

verl/trainer/ppo/ray_trainer.py

verl/single_controller/base/decorator.py

gemini-code-assist · 2025-08-01T07:59:22Z

verl/single_controller/ray/base.py

+                except Exception as e:
+                    print(f"[EXECUTE MODE ERROR] Failed to use custom {custom_execute_mode}, falling back to default: {e}")


Catching a broad Exception can hide underlying issues and make debugging difficult. It's better to catch more specific exceptions that you expect to handle, such as AttributeError if a method is not found, or ValueError from get_predefined_execute_fn. If you must catch a broad exception, consider logging the full traceback for better diagnostics.

gemini-code-assist · 2025-08-01T07:59:23Z

verl/single_controller/ray/base.py

+        except Exception as e:
+            print(f"[WARNING] Global thread pool not available, falling back to sync execution: {e}")


Catching a broad Exception here can mask problems with the global_thread_pool_manager, such as configuration errors. This makes debugging harder. It's recommended to catch more specific exceptions if possible, or at least log the full traceback to provide more context on the failure.

gemini-code-assist · 2025-08-01T07:59:23Z

verl/single_controller/ray/base.py

+            "active_threads": len(self._thread_pool._threads),
+            "queue_size": self._thread_pool._work_queue.qsize() if hasattr(self._thread_pool, '_work_queue') else 0


Accessing private attributes _threads and _work_queue of ThreadPoolExecutor is fragile and can break with future Python updates. ThreadPoolExecutor does not provide a public API for these stats. While this might work now, it's a maintainability risk. Consider wrapping the executor to track tasks and provide statistics in a safer way if these stats are critical.

verl/trainer/ppo/ray_trainer.py

gemini-code-assist · 2025-08-01T07:59:23Z

verl/trainer/ppo/ray_trainer.py

+            if OmegaConf.select(self.config.trainer, "execute_mode") is not None:
+                self.critic_wg.set_execute_mode(self.config.trainer.execute_mode)
            self.critic_wg.init_model()

        if self.use_reference_policy and not self.ref_in_actor:
            self.ref_policy_wg = all_wg["ref"]
+            # Set execute mode for reference policy worker group
+            if OmegaConf.select(self.config.trainer, "execute_mode") is not None:
+                self.ref_policy_wg.set_execute_mode(self.config.trainer.execute_mode)
            self.ref_policy_wg.init_model()

        if self.use_rm:
            self.rm_wg = all_wg["rm"]
+            # Set execute mode for reward model worker group
+            if OmegaConf.select(self.config.trainer, "execute_mode") is not None:
+                self.rm_wg.set_execute_mode(self.config.trainer.execute_mode)
            self.rm_wg.init_model()

        # we should create rollout at the end so that vllm can have a better estimation of kv cache memory
        self.actor_rollout_wg = all_wg["actor_rollout"]
+        # Set execute mode for actor rollout worker group
+        if OmegaConf.select(self.config.trainer, "execute_mode") is not None:
+            self.actor_rollout_wg.set_execute_mode(self.config.trainer.execute_mode)


The call to OmegaConf.select(self.config.trainer, "execute_mode") is repeated for each worker group (critic_wg, ref_policy_wg, rm_wg, actor_rollout_wg). This is inefficient and makes the code harder to maintain. Consider fetching the execute_mode once before these blocks and reusing the variable.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

chenchaoxu7575 requested review from PeterSH6, eric-haibin-lin, tongyx361, vermouth1992, wuxibin89 and zw0610 as code owners August 1, 2025 07:58

gemini-code-assist bot reviewed Aug 1, 2025

View reviewed changes

chenchaoxu7575 and others added 3 commits August 1, 2025 16:55

Update verl/trainer/ppo/ray_trainer.py

1e79170

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update verl/single_controller/base/decorator.py

8dc0867

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update verl/single_controller/ray/base.py

7434cee

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

This was referenced Aug 11, 2025

[RFC] Distributed DataProto to accelerate multimodal data transmission for VLM Training #2847

Open

Where is the communication implementation of @dispatch? #2299

Open

chenchaoxu7575 closed this Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ray]{feat}: 1) add multithread execution mode and global thread pool management; 2) add launch_reward_fn_sub_thread#2861

[ray]{feat}: 1) add multithread execution mode and global thread pool management; 2) add launch_reward_fn_sub_thread#2861
chenchaoxu7575 wants to merge 4 commits intoverl-project:mainfrom
chenchaoxu7575:chenchao_vllm_optim

chenchaoxu7575 commented Aug 1, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Aug 1, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Aug 1, 2025

Uh oh!

gemini-code-assist bot Aug 1, 2025

Uh oh!

gemini-code-assist bot Aug 1, 2025

Uh oh!

Uh oh!

gemini-code-assist bot Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		except Exception as e:
		print(f"[EXECUTE MODE ERROR] Failed to use custom {custom_execute_mode}, falling back to default: {e}")

		except Exception as e:
		print(f"[WARNING] Global thread pool not available, falling back to sync execution: {e}")

		"active_threads": len(self._thread_pool._threads),
		"queue_size": self._thread_pool._work_queue.qsize() if hasattr(self._thread_pool, '_work_queue') else 0

Conversation

chenchaoxu7575 commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

1 REWARD FUNCTION

1.1 launch_reward_fn_async = True (default)

1.2 launch_reward_fn_sub_thread=True (new)

2. EXECUTE WORKERS

2.1 execute_all (default)

2.2 trainer.execute_mode=all_multithread (new)

main changes

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenchaoxu7575 commented Aug 1, 2025 •

edited

Loading

CLAassistant commented Aug 1, 2025 •

edited

Loading