fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators by Edward-Zion-Saji · Pull Request #3538 · axolotl-ai-cloud/axolotl

Edward-Zion-Saji · 2026-03-23T05:49:02Z

Summary

Three independent bug fixes with tests. Each is self-contained and can be reviewed/merged independently.

Fix 1: DPO `tool` role raises `KeyError` (#3217)

File: src/axolotl/prompt_strategies/dpo/chat_template.py

The default role_map_inv in both default() and argilla_chat() was missing the "tool" role. Any DPO dataset whose messages list contained a {"role": "tool", ...} turn (tool-calling preference data) crashed with KeyError: 'tool'.

Fix: Add "tool": ["tool"] to the default mapping in both functions. Fully backwards-compatible — user-supplied roles: config still overrides the default.

Fix 2: Dataset cache invalidated when `output_dir` changes with `added_tokens_overrides` (#3303)

File: src/axolotl/utils/data/shared.py

When added_tokens_overrides is set, Axolotl saves the modified tokenizer into output_dir, making tokenizer.name_or_path an absolute path like /tmp/my_run/modified_tokenizer. This path landed in the dataset fingerprint string, so renaming the output directory busted the cache and forced a full re-tokenization even though nothing had actually changed.

Fix: When added_tokens_overrides is present, derive the tokenizer fingerprint from cfg.tokenizer_config (the canonical HF model name) plus the sorted overrides dict — content that is stable across output directory renames. Without added_tokens_overrides the old behaviour is preserved unchanged.

Fix 3: Three new Pydantic config validators

File: src/axolotl/utils/schemas/config.py

Added three @model_validator(mode="before") methods to AxolotlConfigWCapabilities that catch common misconfigurations at validation time rather than deep inside training:

Validator	What it catches
`check_save_strategy_best_requires_metric`	`save_strategy: best` without `metric_for_best_model` — crashes inside HF Trainer otherwise
`check_streaming_with_val_set_size`	`streaming: true` + `val_set_size > 0` — documented as unsupported but previously not enforced
`check_lora_target_modules_regex`	Invalid Python regex patterns in `lora_target_modules` list — only discovered during tokenisation without this

Tests

File	What is tested
`tests/prompt_strategies/test_dpo_chat_templates.py`	`TestDPOChatTemplateToolRole` — no KeyError on tool role; custom role mapping still respected
`tests/utils/data/test_hash.py`	Hash stability, regression for #3303, different overrides produce different hashes
`tests/utils/schemas/validation/test_config_validators.py`	Pass/fail paths for all three validators

Summary by CodeRabbit

New Features
- Added configuration validation to prevent invalid parameter combinations (e.g., requiring metrics for best-model checkpoints, preventing conflicting streaming settings).
- Added support for tool messages in DPO chat template handling.
Bug Fixes
- Fixed dataset hash computation to remain stable when using token overrides, regardless of output directory changes.
Tests
- Added comprehensive validation tests for configuration parameters and DPO chat template functionality.

coderabbitai · 2026-03-23T05:49:20Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9d9a2981-74de-4f1f-bdc3-85f9849c1908

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Added tool role support to DPO chat template strategies, enhanced dataset hash computation to account for added token overrides, implemented three new configuration validators for strategy/streaming/regex patterns, and added comprehensive test coverage for these features.

Changes

Cohort / File(s)	Summary
DPO Chat Template Tool Role Support `src/axolotl/prompt_strategies/dpo/chat_template.py`	Extended default roles mapping with `"tool": ["tool"]` entry to ensure tool message roles are properly mapped during prompt construction for both default and argilla_chat strategies.
Dataset Hash Computation `src/axolotl/utils/data/shared.py`	Modified `generate_dataset_hash_from_config` to compute fingerprint using `added_tokens_overrides` when present, ensuring hash stability across `output_dir` changes while detecting override differences.
Configuration Validation `src/axolotl/utils/schemas/config.py`	Added three `model_validator` checks: (1) `save_strategy="best"` requires `metric_for_best_model`, (2) `streaming=True` requires `val_set_size=0`, (3) `lora_target_modules` as list must contain valid regex patterns.
DPO Chat Template Tests `tests/prompt_strategies/test_dpo_chat_templates.py`	Added `TestDPOChatTemplateToolRole` class covering tool role handling with default configuration and custom roles mapping validation.
Dataset Hash Computation Tests `tests/utils/data/test_hash.py`	Added `TestGenerateDatasetHashFromConfig` suite validating hash determinism, tokenizer/config changes, and `added_tokens_overrides` regression scenarios.
Configuration Validator Tests `tests/utils/schemas/validation/test_config_validators.py`	Added three validator test suites for `save_strategy`, `streaming`, and `lora_target_modules` regex validation with both positive and negative test cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

FSDP2 fix validation and add tests #2910: Modifies Pydantic validators on AxolotlInputConfig in config.py, directly related to new validator implementations.
Add chat_template.argilla_chat support for DPO datasets #3202: Expands argilla_chat support in DPO chat_template implementations, touching the same functions modified for tool role support.
remove unused field for chat_template.default for DPO training #2755: Changes DPO chat template default function behavior, related to the core message transformation logic being extended.

Suggested reviewers

winglian
NanoCode012
SalmanMohammadi

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes three independent bug fixes: DPO tool role KeyError, dataset hash output_dir issue, and config validators. Each aspect relates directly to changes in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

NanoCode012

Hey, thanks for this PR. The second one is a good catch, I couldn't figure it out why at the time.

For first issue, I'm a bit hesitant. Can you show me existing several HF DPO datasets using this role? Is it not role: "tool_calling" or tool_call? We don't want to create a new standard if possible

NanoCode012 · 2026-03-24T10:36:52Z

+    # When added_tokens_overrides is set the tokenizer is saved into output_dir,
+    # so tokenizer.name_or_path becomes an absolute path that includes output_dir.
+    # Changing output_dir would bust the dataset cache even though the tokenizer
+    # is identical. Use the canonical tokenizer config path plus the overrides
+    # content instead so the hash is stable across output_dir changes.


Let's simplify these comment to one liner or two if possible.

NanoCode012 · 2026-03-24T10:39:01Z

+        """A different tokenizer path produces a different hash."""
+        cfg = _base_cfg()
+        h1 = generate_dataset_hash_from_config(cfg, _datasets(), "NousResearch/Llama-3.2-1B")
+        h2 = generate_dataset_hash_from_config(cfg, _datasets(), "mistralai/Mistral-7B-v0.1")


Maybe I missed it, but we don't pre-download this mistral model. Let's use another that we already re-use in another test to prevent downloading more model data in CI

Edward-Zion-Saji · 2026-03-24T18:25:46Z

Thanks for the review, @NanoCode012! Addressing all three points:

Fix 1 — "tool" role justification

"tool" is the HuggingFace-standardised role name for tool-response turns, adopted by transformers chat templates and the TRL trainers. Here are real HF datasets using it:

trl-internal-testing/toolcall (official TRL test dataset, includes a preference DPO subset) — messages contain {"role": "tool", ...} turns directly: https://huggingface.co/datasets/trl-internal-testing/toolcall
Nanbeige/ToolMind (369 k rows, conversations format) — each multi-turn trace uses {"role": "tool", ...} for tool responses: https://huggingface.co/datasets/Nanbeige/ToolMind
TRL DPO Trainer docs explicitly document "tool" role messages in DPO preference data: https://huggingface.co/docs/trl/dpo_trainer

So "tool" is the established standard, not a new one — it mirrors the OpenAI chat format that HF adopted.

Fix 2 — condensed comment in shared.py ✅ (see latest commit)

Fix 3 — replaced mistralai/Mistral-7B-v0.1 with HuggingFaceTB/SmolLM2-135M ✅ (see latest commit)

NanoCode012 · 2026-03-25T02:45:55Z

Thanks for clarifying the first point, letting CI run

NanoCode012 · 2026-03-26T09:19:46Z

@Edward-Zion-Saji CI failing from your new tests I think

…rs [skip-e2e] - Add 'tool' to default role_map_inv in dpo/chat_template.py default() and argilla_chat() so datasets with tool-call messages no longer raise KeyError: 'tool' (closes axolotl-ai-cloud#3217) - Fix generate_dataset_hash_from_config to use canonical tokenizer config + overrides content instead of tokenizer.name_or_path when added_tokens_overrides is set, preventing cache busting when only output_dir changes (closes axolotl-ai-cloud#3303) - Add three Pydantic config validators to AxolotlConfigWCapabilities: * save_strategy: 'best' requires metric_for_best_model * streaming=True is incompatible with val_set_size > 0 * lora_target_modules list entries must be valid Python regex patterns - Tests for all three changes

…-135M in test_hash

codecov · 2026-04-01T14:04:29Z

Codecov Report

❌ Patch coverage is 96.29630% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/utils/schemas/config.py	95.83%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

NanoCode012 reviewed Mar 24, 2026

View reviewed changes

Edward-Zion-Saji and others added 4 commits April 1, 2026 09:50

review: condense comment in shared.py, swap Mistral model for SmolLM2…

8f53d4e

…-135M in test_hash

chore: lint

56df4b8

move the validators out of the w/ capabilities schema

6f02154

winglian force-pushed the fix/dpo-tool-role-hash-config-validators branch from ae4fd65 to 6f02154 Compare April 1, 2026 13:50

winglian added the scheduled_release This PR is slated for the upcoming release label Apr 1, 2026

winglian added the ready to merge label Apr 1, 2026

Merge branch 'main' into fix/dpo-tool-role-hash-config-validators

dc210f2

winglian merged commit 55a7950 into axolotl-ai-cloud:main Apr 1, 2026
14 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators#3538

fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators#3538
winglian merged 5 commits into
axolotl-ai-cloud:mainfrom
Edward-Zion-Saji:fix/dpo-tool-role-hash-config-validators

Edward-Zion-Saji commented Mar 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

NanoCode012 left a comment •

edited

Loading

Uh oh!

NanoCode012 Mar 24, 2026

Uh oh!

NanoCode012 Mar 24, 2026

Uh oh!

Edward-Zion-Saji commented Mar 24, 2026

Uh oh!

NanoCode012 commented Mar 25, 2026

Uh oh!

NanoCode012 commented Mar 26, 2026

Uh oh!

codecov Bot commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Edward-Zion-Saji commented Mar 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix 1: DPO tool role raises KeyError (#3217)

Fix 2: Dataset cache invalidated when output_dir changes with added_tokens_overrides (#3303)

Fix 3: Three new Pydantic config validators

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

NanoCode012 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

NanoCode012 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Edward-Zion-Saji commented Mar 24, 2026

Uh oh!

NanoCode012 commented Mar 25, 2026

Uh oh!

NanoCode012 commented Mar 26, 2026

Uh oh!

codecov Bot commented Apr 1, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Edward-Zion-Saji commented Mar 23, 2026 •

edited by coderabbitai Bot

Loading

Fix 1: DPO `tool` role raises `KeyError` (#3217)

Fix 2: Dataset cache invalidated when `output_dir` changes with `added_tokens_overrides` (#3303)

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading

NanoCode012 left a comment •

edited

Loading