test(ci): cut CPU test tail — drop dataset_num_proc to 1, split builder tests by winglian · Pull Request #3705 · axolotl-ai-cloud/axolotl

winglian · 2026-06-04T12:49:38Z

The Python 3.12 PyTest legs run ~2x slower than 3.14 on the same test set (816s vs 403s) and were tipping over the 30-minute job timeout. Two causes, both in the slow tail:

dataset_num_proc=4 forks 4 dataset workers per .map() on CPU-only runners, each re-importing the torch stack to process a few hundred rows — pure overhead. Lower to 1 in the affected tests (none assert on it or test multiprocessing); results are unchanged.
--dist loadfile pins a whole file to one worker, so the entire builder suite serialized on a single worker at the end. Move shared fixtures to tests/core/conftest.py and split the RL trainer-builder tests into test_builders_rl.py so they run on a separate worker from the SFT/reward builder tests.

Summary by CodeRabbit

Tests
- Added comprehensive test suite for RL trainer builders covering DPO, ORPO, KTO, GRPO, IPO, and SIMPO configurations
- Expanded test fixtures for trainer configuration validation and optimization
- Updated test configurations for improved performance

…er tests The Python 3.12 PyTest legs run ~2x slower than 3.14 on the same test set (816s vs 403s) and were tipping over the 30-minute job timeout. Two causes, both in the slow tail: - dataset_num_proc=4 forks 4 dataset workers per .map() on CPU-only runners, each re-importing the torch stack to process a few hundred rows — pure overhead. Lower to 1 in the affected tests (none assert on it or test multiprocessing); results are unchanged. - --dist loadfile pins a whole file to one worker, so the entire builder suite serialized on a single worker at the end. Move shared fixtures to tests/core/conftest.py and split the RL trainer-builder tests into test_builders_rl.py so they run on a separate worker from the SFT/reward builder tests.

coderabbitai · 2026-06-04T12:49:51Z

📝 Walkthrough

Walkthrough

This PR introduces a shared pytest fixture module for trainer builder tests, adds a comprehensive RL trainer builder test suite covering six RL algorithms with algorithm-specific training argument assertions and optimizer validation, refactors existing test imports to reduce duplication, and standardizes dataset processing parallelism across multiple dataset loading tests.

Changes

Test Infrastructure and Trainer Builder Tests

Layer / File(s)	Summary
Base Test Fixtures `tests/core/conftest.py`	Shared pytest fixtures provide normalized base configuration with model/tokenizer defaults and common hyperparameters, plus RL/SFT variant configs (`dpo_cfg`, `orpo_cfg`, `kto_cfg`, `grpo_cfg`, `ipo_cfg`, `simpo_cfg`, `sft_cfg`, `rm_cfg`, `prm_cfg`) and `tokenizer`/`model` loading fixtures for trainer builder tests.
RL Trainer Builder Tests `tests/core/test_builders_rl.py`	`TestHFRLTrainerBuilder` verifies `_build_training_arguments()` output across DPO, ORPO, KTO, GRPO, IPO, and SIMPO, with per-algorithm assertions for beta, loss weighting, and label smoothing. GRPO test writes a temporary reward function module and verifies trainer reward loading. Parametrized optimizer test sets `optimizer="muon"` across configs, patches dataset loading conditionally, and validates `MuonOptimizerFactory` kwargs and instance type.
Existing Test Cleanup and Standardization `tests/core/test_builders.py`, `tests/test_datasets.py`, `tests/test_exact_deduplication.py`, `tests/test_packed_dataset.py`	`test_builders.py` imports reduced to only required test dependencies (removed duplicated utilities now in conftest). Dataset tests standardized: `dataset_num_proc` changed from `4` to `1` across eight scenarios in `test_datasets.py` (disk save/load, directory and single parquet/json, DPO revision, local folder, JSON string data), plus one change each in deduplication and packing tests.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels: ready to merge

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.48% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: reducing dataset_num_proc to 1 and splitting builder tests. It directly reflects the key optimizations in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ci-tail-dataset-num-proc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/core/conftest.py (1)
81-180: 💤 Low value

Consider standardizing the return type across all RL config fixtures.

The grpo_cfg fixture wraps its return value in DictDefault(cfg) at line 152, while all other RL config fixtures (dpo_cfg, orpo_cfg, kto_cfg, ipo_cfg, simpo_cfg) return the plain dict from base_cfg.copy() directly. This inconsistency might be intentional due to GRPO's nested DictDefault at line 130, but it could confuse future maintainers.

Consider either:

Wrapping all RL config fixtures in DictDefault for consistency, or

Adding a comment explaining why GRPO requires the extra wrapping
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/core/conftest.py` around lines 81 - 180, The RL fixtures return
inconsistent types: grpo_cfg currently returns DictDefault(cfg) while dpo_cfg,
orpo_cfg, kto_cfg, ipo_cfg, and simpo_cfg return plain dicts; standardize by
returning DictDefault(cfg) from all RL fixtures (update fixture_dpo_cfg,
fixture_orpo_cfg, fixture_kto_cfg, fixture_ipo_cfg, fixture_simpo_cfg to wrap
their cfg in DictDefault before returning) and ensure DictDefault is imported;
alternatively (if intentional) add a short comment in grpo_cfg explaining why it
must return DictDefault to avoid confusion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/core/conftest.py`:
- Around line 81-180: The RL fixtures return inconsistent types: grpo_cfg
currently returns DictDefault(cfg) while dpo_cfg, orpo_cfg, kto_cfg, ipo_cfg,
and simpo_cfg return plain dicts; standardize by returning DictDefault(cfg) from
all RL fixtures (update fixture_dpo_cfg, fixture_orpo_cfg, fixture_kto_cfg,
fixture_ipo_cfg, fixture_simpo_cfg to wrap their cfg in DictDefault before
returning) and ensure DictDefault is imported; alternatively (if intentional)
add a short comment in grpo_cfg explaining why it must return DictDefault to
avoid confusion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab80136b-d76c-4528-8062-fc0daede0bdd

📥 Commits

Reviewing files that changed from the base of the PR and between 41ef48f and 9a50984.

📒 Files selected for processing (6)

tests/core/conftest.py
tests/core/test_builders.py
tests/core/test_builders_rl.py
tests/test_datasets.py
tests/test_exact_deduplication.py
tests/test_packed_dataset.py

codecov · 2026-06-04T13:00:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

winglian merged commit e13bf16 into main Jun 4, 2026
21 of 23 checks passed

winglian deleted the fix/ci-tail-dataset-num-proc branch June 4, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(ci): cut CPU test tail — drop dataset_num_proc to 1, split builder tests#3705

test(ci): cut CPU test tail — drop dataset_num_proc to 1, split builder tests#3705
winglian merged 1 commit into
mainfrom
fix/ci-tail-dataset-num-proc

winglian commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

winglian commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 4, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

winglian commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading