Add chat_template.argilla_chat support for DPO datasets by lhl · Pull Request #3202 · axolotl-ai-cloud/axolotl

lhl · 2025-10-07T04:01:58Z

Description

Creates a new chat_template.argilla_chat prompt strategy for handling
DPO datasets where chosen/rejected fields contain full conversations
(messages + final response), following the pattern of chatml.argilla_chat
and llama3.argilla_chat.

Add argilla_chat() function to chat_template.py
Add chat_template.argilla_chat to RLHF documentation
Add test coverage for argilla_chat with multiple tokenizers

Dataset format:
{
"chosen": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ], "rejected": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] }

Motivation and Context

If you use a chosen/rejected style DPO dataset, you can't use it with a model's chat_template?

How has this been tested?

Tests added to tests, all tests pass:

❯ pytest tests/prompt_strategies/test_dpo_chat_templates.py -v -s
========================================================== test session starts ==========================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /opt/miniforge/envs/axolotl/bin/python3.12
cachedir: .pytest_cache
rootdir: /home/lhl/axolotl
configfile: pyproject.toml
plugins: anyio-4.11.0
collected 6 items

tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_defaults PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_configured PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplatePhi3::test_phi3_defaults PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateGemma::test_gemma_defaults PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestArgillaChatDPOChatTemplate::test_llama3_argilla_chat PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestArgillaChatDPOChatTemplate::test_phi3_argilla_chat PASSED

=========================================================== warnings summary ============================================================
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_defaults
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_defaults
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== 6 passed, 2 warnings in 3.55s =====================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

New Features
- Added support for Argilla-style chat datasets in DPO workflows, enabling proper handling of chosen/rejected conversation pairs and role mappings across templates.
Documentation
- Introduced a new example block demonstrating usage with Argilla-style chat data, including JSON samples for chosen and rejected arrays.
Tests
- Expanded test coverage to include Argilla-style chat datasets across multiple templates, validating prompts and chosen/rejected outputs.

Creates a new chat_template.argilla_chat prompt strategy for handling DPO datasets where chosen/rejected fields contain full conversations (messages + final response), following the pattern of chatml.argilla_chat and llama3.argilla_chat. - Add argilla_chat() function to chat_template.py - Add chat_template.argilla_chat to RLHF documentation - Add test coverage for argilla_chat with multiple tokenizers Dataset format: { "chosen": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ], "rejected": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] }

coderabbitai · 2025-10-07T04:02:10Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Adds a new DPO prompt strategy function argilla_chat for Argilla-style datasets, extends tests to cover it across templates, and updates RLHF docs with an example for argilla_chat usage. No other behavior changes.

Changes

Cohort / File(s)	Summary of changes
Docs `docs/rlhf.qmd`	Adds a new documentation example block for chat_template.argilla_chat with a JSON sample showing chosen/rejected arrays.
DPO chat template `src/axolotl/prompt_strategies/dpo/chat_template.py`	Adds public function argilla_chat(cfg, dataset_idx=0, **kwargs) to handle Argilla-style chosen/rejected conversations; constructs transform_fn that builds prompt, chosen, rejected via role mapping and template application.
Tests `tests/prompt_strategies/test_dpo_chat_templates.py`	Adds argilla_chat_dataset fixture and TestArgillaChatDPOChatTemplate with tests for llama3 and tokenizer_default templates, validating prompt/chosen/rejected outputs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

support for QAT w RL (DPO) #2776 — Also updates DPO chat_template handling around chosen/rejected normalization; likely overlaps with argilla_chat’s input format handling.

Suggested reviewers

NanoCode012
SalmanMohammadi

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly describes the primary change—adding chat_template.argilla_chat support for DPO datasets—which directly matches the new function and associated documentation and tests introduced in the pull request. It is clear, specific, and concise without extraneous details.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce74c20 and ffe49bf.

📒 Files selected for processing (3)

docs/rlhf.qmd (1 hunks)
src/axolotl/prompt_strategies/dpo/chat_template.py (1 hunks)
tests/prompt_strategies/test_dpo_chat_templates.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

src/axolotl/prompt_strategies/dpo/chat_template.py (2)

src/axolotl/utils/schemas/utils.py (1)

handle_legacy_message_fields_logic (8-79)

src/axolotl/utils/chat_templates/base.py (2)

extract_chat_template_args (88-95)

get_chat_template (26-85)

tests/prompt_strategies/test_dpo_chat_templates.py (2)

src/axolotl/prompt_strategies/dpo/chat_template.py (3)

argilla_chat (125-226)

transform_fn (39-120)

transform_fn (168-224)

src/axolotl/utils/dict.py (1)

DictDefault (6-38)

🪛 GitHub Actions: lint

tests/prompt_strategies/test_dpo_chat_templates.py

[error] 287-287: pre-commit: ruff-format hook failed (exit code 1). 1 file reformatted by the hook.

🪛 Ruff (0.13.3)

src/axolotl/prompt_strategies/dpo/chat_template.py

125-125: Unused function argument: kwargs

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: PyTest (3.11, 2.8.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: preview

codecov · 2025-10-07T04:11:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn - Add remove_columns specification for field_chosen and field_rejected - Add comprehensive docstring with Args/Returns sections - Update tests to unpack tuple return value Addresses PR feedback to maintain consistency with chat_template.default() and properly specify columns to remove after dataset transformation.

NanoCode012

Thank you!

NanoCode012 · 2025-10-08T10:32:27Z

Seems to be linting issues

Co-authored-by: Wing Lian <wing.lian@gmail.com>

lhl · 2025-10-16T20:22:41Z

@NanoCode012 oops missed the linting discussion, just committed so should pass now

…loud#3202) * Add chat_template.argilla_chat support for DPO datasets Creates a new chat_template.argilla_chat prompt strategy for handling DPO datasets where chosen/rejected fields contain full conversations (messages + final response), following the pattern of chatml.argilla_chat and llama3.argilla_chat. - Add argilla_chat() function to chat_template.py - Add chat_template.argilla_chat to RLHF documentation - Add test coverage for argilla_chat with multiple tokenizers Dataset format: { "chosen": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ], "rejected": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] } * Fix chat_template.argilla_chat return value contract and add docstring - Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn - Add remove_columns specification for field_chosen and field_rejected - Add comprehensive docstring with Args/Returns sections - Update tests to unpack tuple return value Addresses PR feedback to maintain consistency with chat_template.default() and properly specify columns to remove after dataset transformation. * Update tests/prompt_strategies/test_dpo_chat_templates.py Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> (cherry picked from commit 87565ec)

coderabbitai Bot reviewed Oct 7, 2025

View reviewed changes

Comment thread src/axolotl/prompt_strategies/dpo/chat_template.py Outdated

NanoCode012 approved these changes Oct 8, 2025

View reviewed changes

winglian reviewed Oct 8, 2025

View reviewed changes

Comment thread tests/prompt_strategies/test_dpo_chat_templates.py Outdated

Update tests/prompt_strategies/test_dpo_chat_templates.py

aa50a68

Co-authored-by: Wing Lian <wing.lian@gmail.com>

NanoCode012 merged commit 87565ec into axolotl-ai-cloud:main Oct 17, 2025
15 checks passed

coderabbitai Bot mentioned this pull request Mar 23, 2026

fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators #3538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add chat_template.argilla_chat support for DPO datasets#3202

Add chat_template.argilla_chat support for DPO datasets#3202
NanoCode012 merged 3 commits into
axolotl-ai-cloud:mainfrom
shisa-ai:rl-chat_template-argilla_chat

lhl commented Oct 7, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 7, 2025 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Oct 7, 2025

Uh oh!

NanoCode012 left a comment

Uh oh!

NanoCode012 commented Oct 8, 2025

Uh oh!

Uh oh!

lhl commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lhl commented Oct 7, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Oct 7, 2025

Codecov Report

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

NanoCode012 commented Oct 8, 2025

Uh oh!

Uh oh!

lhl commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lhl commented Oct 7, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 7, 2025 •

edited

Loading