Add chat_template.argilla_chat support for DPO datasets#3202
Conversation
Creates a new chat_template.argilla_chat prompt strategy for handling
DPO datasets where chosen/rejected fields contain full conversations
(messages + final response), following the pattern of chatml.argilla_chat
and llama3.argilla_chat.
- Add argilla_chat() function to chat_template.py
- Add chat_template.argilla_chat to RLHF documentation
- Add test coverage for argilla_chat with multiple tokenizers
Dataset format:
{
"chosen": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"rejected": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
]
}
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughAdds a new DPO prompt strategy function argilla_chat for Argilla-style datasets, extends tests to cover it across templates, and updates RLHF docs with an example for argilla_chat usage. No other behavior changes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/rlhf.qmd(1 hunks)src/axolotl/prompt_strategies/dpo/chat_template.py(1 hunks)tests/prompt_strategies/test_dpo_chat_templates.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/axolotl/prompt_strategies/dpo/chat_template.py (2)
src/axolotl/utils/schemas/utils.py (1)
handle_legacy_message_fields_logic(8-79)src/axolotl/utils/chat_templates/base.py (2)
extract_chat_template_args(88-95)get_chat_template(26-85)
tests/prompt_strategies/test_dpo_chat_templates.py (2)
src/axolotl/prompt_strategies/dpo/chat_template.py (3)
argilla_chat(125-226)transform_fn(39-120)transform_fn(168-224)src/axolotl/utils/dict.py (1)
DictDefault(6-38)
🪛 GitHub Actions: lint
tests/prompt_strategies/test_dpo_chat_templates.py
[error] 287-287: pre-commit: ruff-format hook failed (exit code 1). 1 file reformatted by the hook.
🪛 Ruff (0.13.3)
src/axolotl/prompt_strategies/dpo/chat_template.py
125-125: Unused function argument: kwargs
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.8.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: preview
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn - Add remove_columns specification for field_chosen and field_rejected - Add comprehensive docstring with Args/Returns sections - Update tests to unpack tuple return value Addresses PR feedback to maintain consistency with chat_template.default() and properly specify columns to remove after dataset transformation.
|
Seems to be linting issues |
Co-authored-by: Wing Lian <wing.lian@gmail.com>
|
@NanoCode012 oops missed the linting discussion, just committed so should pass now |
…loud#3202) * Add chat_template.argilla_chat support for DPO datasets Creates a new chat_template.argilla_chat prompt strategy for handling DPO datasets where chosen/rejected fields contain full conversations (messages + final response), following the pattern of chatml.argilla_chat and llama3.argilla_chat. - Add argilla_chat() function to chat_template.py - Add chat_template.argilla_chat to RLHF documentation - Add test coverage for argilla_chat with multiple tokenizers Dataset format: { "chosen": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ], "rejected": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] } * Fix chat_template.argilla_chat return value contract and add docstring - Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn - Add remove_columns specification for field_chosen and field_rejected - Add comprehensive docstring with Args/Returns sections - Update tests to unpack tuple return value Addresses PR feedback to maintain consistency with chat_template.default() and properly specify columns to remove after dataset transformation. * Update tests/prompt_strategies/test_dpo_chat_templates.py Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> (cherry picked from commit 87565ec)
Description
Creates a new chat_template.argilla_chat prompt strategy for handling
DPO datasets where chosen/rejected fields contain full conversations
(messages + final response), following the pattern of chatml.argilla_chat
and llama3.argilla_chat.
Dataset format:
{
"chosen": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ], "rejected": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] }
Motivation and Context
If you use a chosen/rejected style DPO dataset, you can't use it with a model's chat_template?
How has this been tested?
Tests added to tests, all tests pass:
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
New Features
Documentation
Tests