[train] Add `worker_process_setup_hook` to set mp start method to `spawn` by SumanthRH · Pull Request #1333 · NovaSky-AI/SkyRL

SumanthRH · 2026-03-17T20:49:53Z

What does this PR do?

Adds skyrl.utils.worker_setup.worker_setup_fn which sets the multiprocessing start method to 'spawn', and wires it into ray.init via the worker_process_setup_hookruntime_env key. Includes unit tests verifying the hook is applied in Ray workers.

We've previously seen many examples where ray <> fork interact in weird ways. There's code in the skyrl_entrypoint task (torch dataloader) as well as in other ray worker processes (ex: megatron workers) that rely on multiprocessing. Using worker setup hook provides us with a consistent way to handle this for all Ray worker processes.

Test plan:

CPU tests
One GPU test that uses ray : test_model_wrapper.py
New CPU tests for worker setup function

Adds skyrl.utils.worker_setup.worker_setup_fn which sets the multiprocessing start method to 'spawn', and wires it into ray.init via the worker_process_setup_hook runtime_env key. Includes unit tests verifying the hook is applied in Ray workers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The set_start_method call in main_base.py only affected the driver process. Now that worker_process_setup_hook handles this for all Ray workers, the driver-side call is redundant. Move the explanatory comment to worker_setup_fn where it belongs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tting The worker_process_setup_hook only runs in Ray workers, not the driver. Call worker_setup_fn() at module level in main_base (which main_generate also imports) so the driver process also uses spawn. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a worker_process_setup_hook to consistently set the multiprocessing start method to spawn for all Ray workers, which is a good practice to avoid issues with fork in a Ray environment. The changes are well-implemented, including a new utility function, its integration into ray.init, and corresponding unit tests. My feedback focuses on improving the robustness of the new tests to ensure they don't have side effects on other tests.

gemini-code-assist · 2026-03-17T20:53:00Z

+def test_worker_setup_fn_sets_spawn():
+    """Test that worker_setup_fn sets the mp start method to spawn."""
+    # Reset to default first
+    multiprocessing.set_start_method("fork", force=True)
+    worker_setup_fn()
+    assert multiprocessing.get_start_method() == "spawn"
+
+
+def test_worker_setup_fn_idempotent():
+    """Test that calling worker_setup_fn twice doesn't raise."""
+    multiprocessing.set_start_method("spawn", force=True)
+    worker_setup_fn()  # should not raise
+    assert multiprocessing.get_start_method() == "spawn"


These tests modify the global multiprocessing start method but don't restore its original state. This can lead to side effects and flaky tests, as the state change persists for subsequent tests in the same process. To improve test isolation, it's best practice to save the original state before the test and restore it in a finally block. This also makes the tests more portable across different operating systems.

def test_worker_setup_fn_sets_spawn(): """Test that worker_setup_fn sets the mp start method to spawn.""" if 'fork' not in multiprocessing.get_all_start_methods(): pytest.skip("fork start method not supported") original_method = multiprocessing.get_start_method(allow_none=True) try: # Reset to a different state first multiprocessing.set_start_method("fork", force=True) worker_setup_fn() assert multiprocessing.get_start_method() == "spawn" finally: if original_method: multiprocessing.set_start_method(original_method, force=True) def test_worker_setup_fn_idempotent(): """Test that calling worker_setup_fn twice doesn't raise.""" original_method = multiprocessing.get_start_method(allow_none=True) try: multiprocessing.set_start_method("spawn", force=True) worker_setup_fn() # should not raise assert multiprocessing.get_start_method() == "spawn" finally: if original_method: multiprocessing.set_start_method(original_method, force=True)

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

…rt method to `spawn`" (#1344) Reverts #1333 Fixes #1342 and #1343 . It looks like we hit the same issue as ray-project/ray#61350 when dealing with worker process setup hook and vllm with the ray backend. The long term fix is actually in the ray repo - the bug has been fixed in ray-project/ray#61473 and we should be able to make use of the setup hook after upgrading to the next ray release. Until then, I've just reverted the changes and added `spawn` for the mp context for our dataloader I did a quick smoke test by running the gsm8k example and the script enters the first step successfully  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1344" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…awn` (#1333) # What does this PR do? Adds `skyrl.utils.worker_setup.worker_setup_fn` which sets the multiprocessing start method to 'spawn', and wires it into `ray.init` via the `worker_process_setup_hook`runtime_env key. Includes unit tests verifying the hook is applied in Ray workers. We've previously seen many examples where ray <> fork interact in weird ways. There's code in the `skyrl_entrypoint` task (torch dataloader) as well as in other ray worker processes (ex: megatron workers) that rely on multiprocessing. Using worker setup hook provides us with a consistent way to handle this for all Ray worker processes. Test plan: - [x] CPU tests - [x] One GPU test that uses ray : `test_model_wrapper.py` - [x] New CPU tests for worker setup function  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1333" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…rt method to `spawn`" (#1344) Reverts #1333 Fixes #1342 and #1343 . It looks like we hit the same issue as ray-project/ray#61350 when dealing with worker process setup hook and vllm with the ray backend. The long term fix is actually in the ray repo - the bug has been fixed in ray-project/ray#61473 and we should be able to make use of the setup hook after upgrading to the next ray release. Until then, I've just reverted the changes and added `spawn` for the mp context for our dataloader I did a quick smoke test by running the gsm8k example and the script enters the first step successfully  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1344" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

SumanthRH and others added 3 commits March 17, 2026 20:13

SumanthRH marked this pull request as ready for review March 17, 2026 20:51

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

devin-ai-integration Bot reviewed Mar 17, 2026

View reviewed changes

erictang000 approved these changes Mar 17, 2026

View reviewed changes

SumanthRH merged commit dffae95 into main Mar 17, 2026
6 checks passed

SumanthRH deleted the spawn branch March 18, 2026 00:06

erictang000 mentioned this pull request Mar 18, 2026

[train] vllm init with ray backend fails due to worker_process_setup_hook being set #1343

Closed

SumanthRH mentioned this pull request Mar 18, 2026

Revert "[train] Add worker_process_setup_hook to set mp start method to spawn" #1344

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train] Add `worker_process_setup_hook` to set mp start method to `spawn`#1333

[train] Add `worker_process_setup_hook` to set mp start method to `spawn`#1333
SumanthRH merged 3 commits intomainfrom
spawn

SumanthRH commented Mar 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SumanthRH commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SumanthRH commented Mar 17, 2026 •

edited

Loading