Skip to content

[CI] Make MultiItemDataset a global variable after switch to spawn#1346

Merged
SumanthRH merged 1 commit intomainfrom
fix-spawn-cpu-test
Mar 19, 2026
Merged

[CI] Make MultiItemDataset a global variable after switch to spawn#1346
SumanthRH merged 1 commit intomainfrom
fix-spawn-cpu-test

Conversation

@SumanthRH
Copy link
Copy Markdown
Member

@SumanthRH SumanthRH commented Mar 19, 2026

What does this PR do?

Fixes CI failure on main for the SkyRL-Train-CPU workflow: https://github.com/NovaSky-AI/SkyRL/actions/runs/23273262330/job/67670625938

After #1344 , we added multiprocessing_context='spawn' to the build_dataloader function. It looks like there was one case where the change here affected a test that was not affected by the usage of worker_process_startup_hook previously. A CPU test test_dataloader_seeding referenced a local dataset class in dataloader map function. After switch to spawn, we need to ensure that the dataset class is a global variable.


Open with Devin

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH marked this pull request as ready for review March 19, 2026 06:09
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a CI failure that occurred after switching to the 'spawn' multiprocessing context. By moving the MultiItemDataset class to the module's global scope, it becomes accessible to worker processes, resolving the issue. My review includes suggestions to rename the class to _MultiItemDataset to align with Python conventions for internal helpers, which will improve code clarity and maintainability.

def test_build_dataloader_seeding(dummy_config):
"""Test that build_dataloader correctly seeds the dataloader for reproducible shuffling."""
# Create a dataset with multiple distinct items to test shuffling
class MultiItemDataset:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve clarity and indicate that this class is intended for internal use within this test module, consider renaming it to _MultiItemDataset. This follows the Python convention (PEP 8) for internal-use names and prevents it from being accidentally used elsewhere.

Suggested change
class MultiItemDataset:
class _MultiItemDataset:
References
  1. PEP 8 suggests using a single leading underscore for internal-use functions, methods, and attributes to signal they are not part of the public API of the module.

def test_build_dataloader_seeding(dummy_config):
"""Test that build_dataloader correctly seeds the dataloader for reproducible shuffling."""

dataset = MultiItemDataset(size=20)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please update the instantiation to use the new class name _MultiItemDataset as suggested for the class definition.

Suggested change
dataset = MultiItemDataset(size=20)
dataset = _MultiItemDataset(size=20)

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@SumanthRH SumanthRH merged commit a80d886 into main Mar 19, 2026
6 checks passed
SumanthRH added a commit that referenced this pull request Mar 20, 2026
#1346)

# What does this PR do?

Fixes CI failure on main for the `SkyRL-Train-CPU` workflow:
https://github.com/NovaSky-AI/SkyRL/actions/runs/23273262330/job/67670625938

After #1344 , we added `multiprocessing_context='spawn'` to the
`build_dataloader` function. It looks like there was one case where the
change here affected a test that was not affected by the usage of
`worker_process_startup_hook` previously. A CPU test
`test_dataloader_seeding` referenced a local dataset class in dataloader
map function. After switch to `spawn`, we need to ensure that the
dataset class is a global variable.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1346"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
devpatelio pushed a commit that referenced this pull request Mar 20, 2026
#1346)

# What does this PR do?

Fixes CI failure on main for the `SkyRL-Train-CPU` workflow:
https://github.com/NovaSky-AI/SkyRL/actions/runs/23273262330/job/67670625938

After #1344 , we added `multiprocessing_context='spawn'` to the
`build_dataloader` function. It looks like there was one case where the
change here affected a test that was not affected by the usage of
`worker_process_startup_hook` previously. A CPU test
`test_dataloader_seeding` referenced a local dataset class in dataloader
map function. After switch to `spawn`, we need to ensure that the
dataset class is a global variable.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1346"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant