fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators#3538
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdded tool role support to DPO chat template strategies, enhanced dataset hash computation to account for added token overrides, implemented three new configuration validators for strategy/streaming/regex patterns, and added comprehensive test coverage for these features. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey, thanks for this PR. The second one is a good catch, I couldn't figure it out why at the time.
For first issue, I'm a bit hesitant. Can you show me existing several HF DPO datasets using this role? Is it not role: "tool_calling" or tool_call? We don't want to create a new standard if possible
| # When added_tokens_overrides is set the tokenizer is saved into output_dir, | ||
| # so tokenizer.name_or_path becomes an absolute path that includes output_dir. | ||
| # Changing output_dir would bust the dataset cache even though the tokenizer | ||
| # is identical. Use the canonical tokenizer config path plus the overrides | ||
| # content instead so the hash is stable across output_dir changes. |
There was a problem hiding this comment.
Let's simplify these comment to one liner or two if possible.
| """A different tokenizer path produces a different hash.""" | ||
| cfg = _base_cfg() | ||
| h1 = generate_dataset_hash_from_config(cfg, _datasets(), "NousResearch/Llama-3.2-1B") | ||
| h2 = generate_dataset_hash_from_config(cfg, _datasets(), "mistralai/Mistral-7B-v0.1") |
There was a problem hiding this comment.
Maybe I missed it, but we don't pre-download this mistral model. Let's use another that we already re-use in another test to prevent downloading more model data in CI
|
Thanks for the review, @NanoCode012! Addressing all three points: Fix 1 —
So Fix 2 — condensed comment in Fix 3 — replaced |
|
Thanks for clarifying the first point, letting CI run |
|
@Edward-Zion-Saji CI failing from your new tests I think |
…rs [skip-e2e] - Add 'tool' to default role_map_inv in dpo/chat_template.py default() and argilla_chat() so datasets with tool-call messages no longer raise KeyError: 'tool' (closes axolotl-ai-cloud#3217) - Fix generate_dataset_hash_from_config to use canonical tokenizer config + overrides content instead of tokenizer.name_or_path when added_tokens_overrides is set, preventing cache busting when only output_dir changes (closes axolotl-ai-cloud#3303) - Add three Pydantic config validators to AxolotlConfigWCapabilities: * save_strategy: 'best' requires metric_for_best_model * streaming=True is incompatible with val_set_size > 0 * lora_target_modules list entries must be valid Python regex patterns - Tests for all three changes
…-135M in test_hash
ae4fd65 to
6f02154
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Summary
Three independent bug fixes with tests. Each is self-contained and can be reviewed/merged independently.
Fix 1: DPO
toolrole raisesKeyError(#3217)File:
src/axolotl/prompt_strategies/dpo/chat_template.pyThe default
role_map_invin bothdefault()andargilla_chat()was missing the"tool"role. Any DPO dataset whosemessageslist contained a{"role": "tool", ...}turn (tool-calling preference data) crashed withKeyError: 'tool'.Fix: Add
"tool": ["tool"]to the default mapping in both functions. Fully backwards-compatible — user-suppliedroles:config still overrides the default.Fix 2: Dataset cache invalidated when
output_dirchanges withadded_tokens_overrides(#3303)File:
src/axolotl/utils/data/shared.pyWhen
added_tokens_overridesis set, Axolotl saves the modified tokenizer intooutput_dir, makingtokenizer.name_or_pathan absolute path like/tmp/my_run/modified_tokenizer. This path landed in the dataset fingerprint string, so renaming the output directory busted the cache and forced a full re-tokenization even though nothing had actually changed.Fix: When
added_tokens_overridesis present, derive the tokenizer fingerprint fromcfg.tokenizer_config(the canonical HF model name) plus the sorted overrides dict — content that is stable across output directory renames. Withoutadded_tokens_overridesthe old behaviour is preserved unchanged.Fix 3: Three new Pydantic config validators
File:
src/axolotl/utils/schemas/config.pyAdded three
@model_validator(mode="before")methods toAxolotlConfigWCapabilitiesthat catch common misconfigurations at validation time rather than deep inside training:check_save_strategy_best_requires_metricsave_strategy: bestwithoutmetric_for_best_model— crashes inside HF Trainer otherwisecheck_streaming_with_val_set_sizestreaming: true+val_set_size > 0— documented as unsupported but previously not enforcedcheck_lora_target_modules_regexlora_target_moduleslist — only discovered during tokenisation without thisTests
tests/prompt_strategies/test_dpo_chat_templates.pyTestDPOChatTemplateToolRole— no KeyError on tool role; custom role mapping still respectedtests/utils/data/test_hash.pytests/utils/schemas/validation/test_config_validators.pySummary by CodeRabbit
New Features
Bug Fixes
Tests