Skip to content

test(ci): cut CPU test tail — drop dataset_num_proc to 1, split builder tests#3705

Merged
winglian merged 1 commit into
mainfrom
fix/ci-tail-dataset-num-proc
Jun 4, 2026
Merged

test(ci): cut CPU test tail — drop dataset_num_proc to 1, split builder tests#3705
winglian merged 1 commit into
mainfrom
fix/ci-tail-dataset-num-proc

Conversation

@winglian

@winglian winglian commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

The Python 3.12 PyTest legs run ~2x slower than 3.14 on the same test set (816s vs 403s) and were tipping over the 30-minute job timeout. Two causes, both in the slow tail:

  • dataset_num_proc=4 forks 4 dataset workers per .map() on CPU-only runners, each re-importing the torch stack to process a few hundred rows — pure overhead. Lower to 1 in the affected tests (none assert on it or test multiprocessing); results are unchanged.
  • --dist loadfile pins a whole file to one worker, so the entire builder suite serialized on a single worker at the end. Move shared fixtures to tests/core/conftest.py and split the RL trainer-builder tests into test_builders_rl.py so they run on a separate worker from the SFT/reward builder tests.

Summary by CodeRabbit

  • Tests
    • Added comprehensive test suite for RL trainer builders covering DPO, ORPO, KTO, GRPO, IPO, and SIMPO configurations
    • Expanded test fixtures for trainer configuration validation and optimization
    • Updated test configurations for improved performance

…er tests

The Python 3.12 PyTest legs run ~2x slower than 3.14 on the same test set
(816s vs 403s) and were tipping over the 30-minute job timeout. Two causes,
both in the slow tail:

- dataset_num_proc=4 forks 4 dataset workers per .map() on CPU-only runners,
  each re-importing the torch stack to process a few hundred rows — pure
  overhead. Lower to 1 in the affected tests (none assert on it or test
  multiprocessing); results are unchanged.
- --dist loadfile pins a whole file to one worker, so the entire builder
  suite serialized on a single worker at the end. Move shared fixtures to
  tests/core/conftest.py and split the RL trainer-builder tests into
  test_builders_rl.py so they run on a separate worker from the SFT/reward
  builder tests.
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a shared pytest fixture module for trainer builder tests, adds a comprehensive RL trainer builder test suite covering six RL algorithms with algorithm-specific training argument assertions and optimizer validation, refactors existing test imports to reduce duplication, and standardizes dataset processing parallelism across multiple dataset loading tests.

Changes

Test Infrastructure and Trainer Builder Tests

Layer / File(s) Summary
Base Test Fixtures
tests/core/conftest.py
Shared pytest fixtures provide normalized base configuration with model/tokenizer defaults and common hyperparameters, plus RL/SFT variant configs (dpo_cfg, orpo_cfg, kto_cfg, grpo_cfg, ipo_cfg, simpo_cfg, sft_cfg, rm_cfg, prm_cfg) and tokenizer/model loading fixtures for trainer builder tests.
RL Trainer Builder Tests
tests/core/test_builders_rl.py
TestHFRLTrainerBuilder verifies _build_training_arguments() output across DPO, ORPO, KTO, GRPO, IPO, and SIMPO, with per-algorithm assertions for beta, loss weighting, and label smoothing. GRPO test writes a temporary reward function module and verifies trainer reward loading. Parametrized optimizer test sets optimizer="muon" across configs, patches dataset loading conditionally, and validates MuonOptimizerFactory kwargs and instance type.
Existing Test Cleanup and Standardization
tests/core/test_builders.py, tests/test_datasets.py, tests/test_exact_deduplication.py, tests/test_packed_dataset.py
test_builders.py imports reduced to only required test dependencies (removed duplicated utilities now in conftest). Dataset tests standardized: dataset_num_proc changed from 4 to 1 across eight scenarios in test_datasets.py (disk save/load, directory and single parquet/json, DPO revision, local folder, JSON string data), plus one change each in deduplication and packing tests.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels: ready to merge

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: reducing dataset_num_proc to 1 and splitting builder tests. It directly reflects the key optimizations in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ci-tail-dataset-num-proc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/core/conftest.py (1)

81-180: 💤 Low value

Consider standardizing the return type across all RL config fixtures.

The grpo_cfg fixture wraps its return value in DictDefault(cfg) at line 152, while all other RL config fixtures (dpo_cfg, orpo_cfg, kto_cfg, ipo_cfg, simpo_cfg) return the plain dict from base_cfg.copy() directly. This inconsistency might be intentional due to GRPO's nested DictDefault at line 130, but it could confuse future maintainers.

Consider either:

  • Wrapping all RL config fixtures in DictDefault for consistency, or
  • Adding a comment explaining why GRPO requires the extra wrapping
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/core/conftest.py` around lines 81 - 180, The RL fixtures return
inconsistent types: grpo_cfg currently returns DictDefault(cfg) while dpo_cfg,
orpo_cfg, kto_cfg, ipo_cfg, and simpo_cfg return plain dicts; standardize by
returning DictDefault(cfg) from all RL fixtures (update fixture_dpo_cfg,
fixture_orpo_cfg, fixture_kto_cfg, fixture_ipo_cfg, fixture_simpo_cfg to wrap
their cfg in DictDefault before returning) and ensure DictDefault is imported;
alternatively (if intentional) add a short comment in grpo_cfg explaining why it
must return DictDefault to avoid confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/core/conftest.py`:
- Around line 81-180: The RL fixtures return inconsistent types: grpo_cfg
currently returns DictDefault(cfg) while dpo_cfg, orpo_cfg, kto_cfg, ipo_cfg,
and simpo_cfg return plain dicts; standardize by returning DictDefault(cfg) from
all RL fixtures (update fixture_dpo_cfg, fixture_orpo_cfg, fixture_kto_cfg,
fixture_ipo_cfg, fixture_simpo_cfg to wrap their cfg in DictDefault before
returning) and ensure DictDefault is imported; alternatively (if intentional)
add a short comment in grpo_cfg explaining why it must return DictDefault to
avoid confusion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab80136b-d76c-4528-8062-fc0daede0bdd

📥 Commits

Reviewing files that changed from the base of the PR and between 41ef48f and 9a50984.

📒 Files selected for processing (6)
  • tests/core/conftest.py
  • tests/core/test_builders.py
  • tests/core/test_builders_rl.py
  • tests/test_datasets.py
  • tests/test_exact_deduplication.py
  • tests/test_packed_dataset.py

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@winglian winglian merged commit e13bf16 into main Jun 4, 2026
21 of 23 checks passed
@winglian winglian deleted the fix/ci-tail-dataset-num-proc branch June 4, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant