Skip to content

fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators#3538

Merged
winglian merged 5 commits into
axolotl-ai-cloud:mainfrom
Edward-Zion-Saji:fix/dpo-tool-role-hash-config-validators
Apr 1, 2026
Merged

fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators#3538
winglian merged 5 commits into
axolotl-ai-cloud:mainfrom
Edward-Zion-Saji:fix/dpo-tool-role-hash-config-validators

Conversation

@Edward-Zion-Saji

@Edward-Zion-Saji Edward-Zion-Saji commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Three independent bug fixes with tests. Each is self-contained and can be reviewed/merged independently.


Fix 1: DPO tool role raises KeyError (#3217)

File: src/axolotl/prompt_strategies/dpo/chat_template.py

The default role_map_inv in both default() and argilla_chat() was missing the "tool" role. Any DPO dataset whose messages list contained a {"role": "tool", ...} turn (tool-calling preference data) crashed with KeyError: 'tool'.

Fix: Add "tool": ["tool"] to the default mapping in both functions. Fully backwards-compatible — user-supplied roles: config still overrides the default.


Fix 2: Dataset cache invalidated when output_dir changes with added_tokens_overrides (#3303)

File: src/axolotl/utils/data/shared.py

When added_tokens_overrides is set, Axolotl saves the modified tokenizer into output_dir, making tokenizer.name_or_path an absolute path like /tmp/my_run/modified_tokenizer. This path landed in the dataset fingerprint string, so renaming the output directory busted the cache and forced a full re-tokenization even though nothing had actually changed.

Fix: When added_tokens_overrides is present, derive the tokenizer fingerprint from cfg.tokenizer_config (the canonical HF model name) plus the sorted overrides dict — content that is stable across output directory renames. Without added_tokens_overrides the old behaviour is preserved unchanged.


Fix 3: Three new Pydantic config validators

File: src/axolotl/utils/schemas/config.py

Added three @model_validator(mode="before") methods to AxolotlConfigWCapabilities that catch common misconfigurations at validation time rather than deep inside training:

Validator What it catches
check_save_strategy_best_requires_metric save_strategy: best without metric_for_best_model — crashes inside HF Trainer otherwise
check_streaming_with_val_set_size streaming: true + val_set_size > 0 — documented as unsupported but previously not enforced
check_lora_target_modules_regex Invalid Python regex patterns in lora_target_modules list — only discovered during tokenisation without this

Tests

File What is tested
tests/prompt_strategies/test_dpo_chat_templates.py TestDPOChatTemplateToolRole — no KeyError on tool role; custom role mapping still respected
tests/utils/data/test_hash.py Hash stability, regression for #3303, different overrides produce different hashes
tests/utils/schemas/validation/test_config_validators.py Pass/fail paths for all three validators

Summary by CodeRabbit

  • New Features

    • Added configuration validation to prevent invalid parameter combinations (e.g., requiring metrics for best-model checkpoints, preventing conflicting streaming settings).
    • Added support for tool messages in DPO chat template handling.
  • Bug Fixes

    • Fixed dataset hash computation to remain stable when using token overrides, regardless of output directory changes.
  • Tests

    • Added comprehensive validation tests for configuration parameters and DPO chat template functionality.

@coderabbitai

coderabbitai Bot commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9d9a2981-74de-4f1f-bdc3-85f9849c1908

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added tool role support to DPO chat template strategies, enhanced dataset hash computation to account for added token overrides, implemented three new configuration validators for strategy/streaming/regex patterns, and added comprehensive test coverage for these features.

Changes

Cohort / File(s) Summary
DPO Chat Template Tool Role Support
src/axolotl/prompt_strategies/dpo/chat_template.py
Extended default roles mapping with "tool": ["tool"] entry to ensure tool message roles are properly mapped during prompt construction for both default and argilla_chat strategies.
Dataset Hash Computation
src/axolotl/utils/data/shared.py
Modified generate_dataset_hash_from_config to compute fingerprint using added_tokens_overrides when present, ensuring hash stability across output_dir changes while detecting override differences.
Configuration Validation
src/axolotl/utils/schemas/config.py
Added three model_validator checks: (1) save_strategy="best" requires metric_for_best_model, (2) streaming=True requires val_set_size=0, (3) lora_target_modules as list must contain valid regex patterns.
DPO Chat Template Tests
tests/prompt_strategies/test_dpo_chat_templates.py
Added TestDPOChatTemplateToolRole class covering tool role handling with default configuration and custom roles mapping validation.
Dataset Hash Computation Tests
tests/utils/data/test_hash.py
Added TestGenerateDatasetHashFromConfig suite validating hash determinism, tokenizer/config changes, and added_tokens_overrides regression scenarios.
Configuration Validator Tests
tests/utils/schemas/validation/test_config_validators.py
Added three validator test suites for save_strategy, streaming, and lora_target_modules regex validation with both positive and negative test cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • winglian
  • NanoCode012
  • SalmanMohammadi
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes three independent bug fixes: DPO tool role KeyError, dataset hash output_dir issue, and config validators. Each aspect relates directly to changes in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@NanoCode012 NanoCode012 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for this PR. The second one is a good catch, I couldn't figure it out why at the time.

For first issue, I'm a bit hesitant. Can you show me existing several HF DPO datasets using this role? Is it not role: "tool_calling" or tool_call? We don't want to create a new standard if possible

Comment thread src/axolotl/utils/data/shared.py Outdated
Comment on lines +519 to +523
# When added_tokens_overrides is set the tokenizer is saved into output_dir,
# so tokenizer.name_or_path becomes an absolute path that includes output_dir.
# Changing output_dir would bust the dataset cache even though the tokenizer
# is identical. Use the canonical tokenizer config path plus the overrides
# content instead so the hash is stable across output_dir changes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify these comment to one liner or two if possible.

Comment thread tests/utils/data/test_hash.py Outdated
"""A different tokenizer path produces a different hash."""
cfg = _base_cfg()
h1 = generate_dataset_hash_from_config(cfg, _datasets(), "NousResearch/Llama-3.2-1B")
h2 = generate_dataset_hash_from_config(cfg, _datasets(), "mistralai/Mistral-7B-v0.1")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it, but we don't pre-download this mistral model. Let's use another that we already re-use in another test to prevent downloading more model data in CI

@Edward-Zion-Saji

Copy link
Copy Markdown
Contributor Author

Thanks for the review, @NanoCode012! Addressing all three points:

Fix 1 — "tool" role justification

"tool" is the HuggingFace-standardised role name for tool-response turns, adopted by transformers chat templates and the TRL trainers. Here are real HF datasets using it:

So "tool" is the established standard, not a new one — it mirrors the OpenAI chat format that HF adopted.

Fix 2 — condensed comment in shared.py ✅ (see latest commit)

Fix 3 — replaced mistralai/Mistral-7B-v0.1 with HuggingFaceTB/SmolLM2-135M ✅ (see latest commit)

@NanoCode012

Copy link
Copy Markdown
Collaborator

Thanks for clarifying the first point, letting CI run

@NanoCode012

Copy link
Copy Markdown
Collaborator

@Edward-Zion-Saji CI failing from your new tests I think

Edward-Zion-Saji and others added 4 commits April 1, 2026 09:50
…rs [skip-e2e]

- Add 'tool' to default role_map_inv in dpo/chat_template.py default() and
  argilla_chat() so datasets with tool-call messages no longer raise
  KeyError: 'tool' (closes axolotl-ai-cloud#3217)

- Fix generate_dataset_hash_from_config to use canonical tokenizer config +
  overrides content instead of tokenizer.name_or_path when added_tokens_overrides
  is set, preventing cache busting when only output_dir changes (closes axolotl-ai-cloud#3303)

- Add three Pydantic config validators to AxolotlConfigWCapabilities:
  * save_strategy: 'best' requires metric_for_best_model
  * streaming=True is incompatible with val_set_size > 0
  * lora_target_modules list entries must be valid Python regex patterns

- Tests for all three changes
@winglian winglian force-pushed the fix/dpo-tool-role-hash-config-validators branch from ae4fd65 to 6f02154 Compare April 1, 2026 13:50
@winglian winglian added the scheduled_release This PR is slated for the upcoming release label Apr 1, 2026
@codecov

codecov Bot commented Apr 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.29630% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/utils/schemas/config.py 95.83% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@winglian winglian merged commit 55a7950 into axolotl-ai-cloud:main Apr 1, 2026
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready to merge scheduled_release This PR is slated for the upcoming release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants