test(ci): cut CPU test tail — drop dataset_num_proc to 1, split builder tests#3705
Conversation
…er tests The Python 3.12 PyTest legs run ~2x slower than 3.14 on the same test set (816s vs 403s) and were tipping over the 30-minute job timeout. Two causes, both in the slow tail: - dataset_num_proc=4 forks 4 dataset workers per .map() on CPU-only runners, each re-importing the torch stack to process a few hundred rows — pure overhead. Lower to 1 in the affected tests (none assert on it or test multiprocessing); results are unchanged. - --dist loadfile pins a whole file to one worker, so the entire builder suite serialized on a single worker at the end. Move shared fixtures to tests/core/conftest.py and split the RL trainer-builder tests into test_builders_rl.py so they run on a separate worker from the SFT/reward builder tests.
📝 WalkthroughWalkthroughThis PR introduces a shared pytest fixture module for trainer builder tests, adds a comprehensive RL trainer builder test suite covering six RL algorithms with algorithm-specific training argument assertions and optimizer validation, refactors existing test imports to reduce duplication, and standardizes dataset processing parallelism across multiple dataset loading tests. ChangesTest Infrastructure and Trainer Builder Tests
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/core/conftest.py (1)
81-180: 💤 Low valueConsider standardizing the return type across all RL config fixtures.
The
grpo_cfgfixture wraps its return value inDictDefault(cfg)at line 152, while all other RL config fixtures (dpo_cfg,orpo_cfg,kto_cfg,ipo_cfg,simpo_cfg) return the plain dict frombase_cfg.copy()directly. This inconsistency might be intentional due to GRPO's nestedDictDefaultat line 130, but it could confuse future maintainers.Consider either:
- Wrapping all RL config fixtures in
DictDefaultfor consistency, or- Adding a comment explaining why GRPO requires the extra wrapping
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/core/conftest.py` around lines 81 - 180, The RL fixtures return inconsistent types: grpo_cfg currently returns DictDefault(cfg) while dpo_cfg, orpo_cfg, kto_cfg, ipo_cfg, and simpo_cfg return plain dicts; standardize by returning DictDefault(cfg) from all RL fixtures (update fixture_dpo_cfg, fixture_orpo_cfg, fixture_kto_cfg, fixture_ipo_cfg, fixture_simpo_cfg to wrap their cfg in DictDefault before returning) and ensure DictDefault is imported; alternatively (if intentional) add a short comment in grpo_cfg explaining why it must return DictDefault to avoid confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/core/conftest.py`:
- Around line 81-180: The RL fixtures return inconsistent types: grpo_cfg
currently returns DictDefault(cfg) while dpo_cfg, orpo_cfg, kto_cfg, ipo_cfg,
and simpo_cfg return plain dicts; standardize by returning DictDefault(cfg) from
all RL fixtures (update fixture_dpo_cfg, fixture_orpo_cfg, fixture_kto_cfg,
fixture_ipo_cfg, fixture_simpo_cfg to wrap their cfg in DictDefault before
returning) and ensure DictDefault is imported; alternatively (if intentional)
add a short comment in grpo_cfg explaining why it must return DictDefault to
avoid confusion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ab80136b-d76c-4528-8062-fc0daede0bdd
📒 Files selected for processing (6)
tests/core/conftest.pytests/core/test_builders.pytests/core/test_builders_rl.pytests/test_datasets.pytests/test_exact_deduplication.pytests/test_packed_dataset.py
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The Python 3.12 PyTest legs run ~2x slower than 3.14 on the same test set (816s vs 403s) and were tipping over the 30-minute job timeout. Two causes, both in the slow tail:
Summary by CodeRabbit