[CI] Make MultiItemDataset a global variable after switch to spawn#1346
[CI] Make MultiItemDataset a global variable after switch to spawn#1346
MultiItemDataset a global variable after switch to spawn#1346Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a CI failure that occurred after switching to the 'spawn' multiprocessing context. By moving the MultiItemDataset class to the module's global scope, it becomes accessible to worker processes, resolving the issue. My review includes suggestions to rename the class to _MultiItemDataset to align with Python conventions for internal helpers, which will improve code clarity and maintainability.
| def test_build_dataloader_seeding(dummy_config): | ||
| """Test that build_dataloader correctly seeds the dataloader for reproducible shuffling.""" | ||
| # Create a dataset with multiple distinct items to test shuffling | ||
| class MultiItemDataset: |
There was a problem hiding this comment.
To improve clarity and indicate that this class is intended for internal use within this test module, consider renaming it to _MultiItemDataset. This follows the Python convention (PEP 8) for internal-use names and prevents it from being accidentally used elsewhere.
| class MultiItemDataset: | |
| class _MultiItemDataset: |
References
- PEP 8 suggests using a single leading underscore for internal-use functions, methods, and attributes to signal they are not part of the public API of the module.
| def test_build_dataloader_seeding(dummy_config): | ||
| """Test that build_dataloader correctly seeds the dataloader for reproducible shuffling.""" | ||
|
|
||
| dataset = MultiItemDataset(size=20) |
#1346) # What does this PR do? Fixes CI failure on main for the `SkyRL-Train-CPU` workflow: https://github.com/NovaSky-AI/SkyRL/actions/runs/23273262330/job/67670625938 After #1344 , we added `multiprocessing_context='spawn'` to the `build_dataloader` function. It looks like there was one case where the change here affected a test that was not affected by the usage of `worker_process_startup_hook` previously. A CPU test `test_dataloader_seeding` referenced a local dataset class in dataloader map function. After switch to `spawn`, we need to ensure that the dataset class is a global variable. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1346" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
#1346) # What does this PR do? Fixes CI failure on main for the `SkyRL-Train-CPU` workflow: https://github.com/NovaSky-AI/SkyRL/actions/runs/23273262330/job/67670625938 After #1344 , we added `multiprocessing_context='spawn'` to the `build_dataloader` function. It looks like there was one case where the change here affected a test that was not affected by the usage of `worker_process_startup_hook` previously. A CPU test `test_dataloader_seeding` referenced a local dataset class in dataloader map function. After switch to `spawn`, we need to ensure that the dataset class is a global variable. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1346" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
What does this PR do?
Fixes CI failure on main for the
SkyRL-Train-CPUworkflow: https://github.com/NovaSky-AI/SkyRL/actions/runs/23273262330/job/67670625938After #1344 , we added
multiprocessing_context='spawn'to thebuild_dataloaderfunction. It looks like there was one case where the change here affected a test that was not affected by the usage ofworker_process_startup_hookpreviously. A CPU testtest_dataloader_seedingreferenced a local dataset class in dataloader map function. After switch tospawn, we need to ensure that the dataset class is a global variable.