Skip to content

Add chat_template.argilla_chat support for DPO datasets#3202

Merged
NanoCode012 merged 3 commits into
axolotl-ai-cloud:mainfrom
shisa-ai:rl-chat_template-argilla_chat
Oct 17, 2025
Merged

Add chat_template.argilla_chat support for DPO datasets#3202
NanoCode012 merged 3 commits into
axolotl-ai-cloud:mainfrom
shisa-ai:rl-chat_template-argilla_chat

Conversation

@lhl

@lhl lhl commented Oct 7, 2025

Copy link
Copy Markdown
Contributor

Description

Creates a new chat_template.argilla_chat prompt strategy for handling
DPO datasets where chosen/rejected fields contain full conversations
(messages + final response), following the pattern of chatml.argilla_chat
and llama3.argilla_chat.

  • Add argilla_chat() function to chat_template.py
  • Add chat_template.argilla_chat to RLHF documentation
  • Add test coverage for argilla_chat with multiple tokenizers

Dataset format:
{
"chosen": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ], "rejected": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] }

Motivation and Context

If you use a chosen/rejected style DPO dataset, you can't use it with a model's chat_template?

How has this been tested?

Tests added to tests, all tests pass:

❯ pytest tests/prompt_strategies/test_dpo_chat_templates.py -v -s
========================================================== test session starts ==========================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /opt/miniforge/envs/axolotl/bin/python3.12
cachedir: .pytest_cache
rootdir: /home/lhl/axolotl
configfile: pyproject.toml
plugins: anyio-4.11.0
collected 6 items

tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_defaults PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_configured PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplatePhi3::test_phi3_defaults PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateGemma::test_gemma_defaults PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestArgillaChatDPOChatTemplate::test_llama3_argilla_chat PASSED
tests/prompt_strategies/test_dpo_chat_templates.py::TestArgillaChatDPOChatTemplate::test_phi3_argilla_chat PASSED

=========================================================== warnings summary ============================================================
tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_defaults
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

tests/prompt_strategies/test_dpo_chat_templates.py::TestAssistantDPOChatTemplateLlama3::test_llama3_defaults
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== 6 passed, 2 warnings in 3.55s =====================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • New Features

    • Added support for Argilla-style chat datasets in DPO workflows, enabling proper handling of chosen/rejected conversation pairs and role mappings across templates.
  • Documentation

    • Introduced a new example block demonstrating usage with Argilla-style chat data, including JSON samples for chosen and rejected arrays.
  • Tests

    • Expanded test coverage to include Argilla-style chat datasets across multiple templates, validating prompts and chosen/rejected outputs.

  Creates a new chat_template.argilla_chat prompt strategy for handling
  DPO datasets where chosen/rejected fields contain full conversations
  (messages + final response), following the pattern of chatml.argilla_chat
  and llama3.argilla_chat.

  - Add argilla_chat() function to chat_template.py
  - Add chat_template.argilla_chat to RLHF documentation
  - Add test coverage for argilla_chat with multiple tokenizers

  Dataset format:
  {
    "chosen": [
      {"role": "user", "content": "..."},
      {"role": "assistant", "content": "..."}
    ],
    "rejected": [
      {"role": "user", "content": "..."},
      {"role": "assistant", "content": "..."}
    ]
  }
@coderabbitai

coderabbitai Bot commented Oct 7, 2025

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Adds a new DPO prompt strategy function argilla_chat for Argilla-style datasets, extends tests to cover it across templates, and updates RLHF docs with an example for argilla_chat usage. No other behavior changes.

Changes

Cohort / File(s) Summary of changes
Docs
docs/rlhf.qmd
Adds a new documentation example block for chat_template.argilla_chat with a JSON sample showing chosen/rejected arrays.
DPO chat template
src/axolotl/prompt_strategies/dpo/chat_template.py
Adds public function argilla_chat(cfg, dataset_idx=0, **kwargs) to handle Argilla-style chosen/rejected conversations; constructs transform_fn that builds prompt, chosen, rejected via role mapping and template application.
Tests
tests/prompt_strategies/test_dpo_chat_templates.py
Adds argilla_chat_dataset fixture and TestArgillaChatDPOChatTemplate with tests for llama3 and tokenizer_default templates, validating prompt/chosen/rejected outputs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • support for QAT w RL (DPO) #2776 — Also updates DPO chat_template handling around chosen/rejected normalization; likely overlaps with argilla_chat’s input format handling.

Suggested reviewers

  • NanoCode012
  • SalmanMohammadi

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary change—adding chat_template.argilla_chat support for DPO datasets—which directly matches the new function and associated documentation and tests introduced in the pull request. It is clear, specific, and concise without extraneous details.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce74c20 and ffe49bf.

📒 Files selected for processing (3)
  • docs/rlhf.qmd (1 hunks)
  • src/axolotl/prompt_strategies/dpo/chat_template.py (1 hunks)
  • tests/prompt_strategies/test_dpo_chat_templates.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/axolotl/prompt_strategies/dpo/chat_template.py (2)
src/axolotl/utils/schemas/utils.py (1)
  • handle_legacy_message_fields_logic (8-79)
src/axolotl/utils/chat_templates/base.py (2)
  • extract_chat_template_args (88-95)
  • get_chat_template (26-85)
tests/prompt_strategies/test_dpo_chat_templates.py (2)
src/axolotl/prompt_strategies/dpo/chat_template.py (3)
  • argilla_chat (125-226)
  • transform_fn (39-120)
  • transform_fn (168-224)
src/axolotl/utils/dict.py (1)
  • DictDefault (6-38)
🪛 GitHub Actions: lint
tests/prompt_strategies/test_dpo_chat_templates.py

[error] 287-287: pre-commit: ruff-format hook failed (exit code 1). 1 file reformatted by the hook.

🪛 Ruff (0.13.3)
src/axolotl/prompt_strategies/dpo/chat_template.py

125-125: Unused function argument: kwargs

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.8.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: preview

Comment thread src/axolotl/prompt_strategies/dpo/chat_template.py Outdated
@codecov

codecov Bot commented Oct 7, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn
- Add remove_columns specification for field_chosen and field_rejected
- Add comprehensive docstring with Args/Returns sections
- Update tests to unpack tuple return value

Addresses PR feedback to maintain consistency with chat_template.default()
and properly specify columns to remove after dataset transformation.

@NanoCode012 NanoCode012 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@NanoCode012

Copy link
Copy Markdown
Collaborator

Seems to be linting issues

Comment thread tests/prompt_strategies/test_dpo_chat_templates.py Outdated
Co-authored-by: Wing Lian <wing.lian@gmail.com>
@lhl

lhl commented Oct 16, 2025

Copy link
Copy Markdown
Contributor Author

@NanoCode012 oops missed the linting discussion, just committed so should pass now

@NanoCode012 NanoCode012 merged commit 87565ec into axolotl-ai-cloud:main Oct 17, 2025
15 checks passed
flaviusburca pushed a commit to invergent-ai/axolotl that referenced this pull request Oct 18, 2025
…loud#3202)

* Add chat_template.argilla_chat support for DPO datasets

  Creates a new chat_template.argilla_chat prompt strategy for handling
  DPO datasets where chosen/rejected fields contain full conversations
  (messages + final response), following the pattern of chatml.argilla_chat
  and llama3.argilla_chat.

  - Add argilla_chat() function to chat_template.py
  - Add chat_template.argilla_chat to RLHF documentation
  - Add test coverage for argilla_chat with multiple tokenizers

  Dataset format:
  {
    "chosen": [
      {"role": "user", "content": "..."},
      {"role": "assistant", "content": "..."}
    ],
    "rejected": [
      {"role": "user", "content": "..."},
      {"role": "assistant", "content": "..."}
    ]
  }

* Fix chat_template.argilla_chat return value contract and add docstring

- Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn
- Add remove_columns specification for field_chosen and field_rejected
- Add comprehensive docstring with Args/Returns sections
- Update tests to unpack tuple return value

Addresses PR feedback to maintain consistency with chat_template.default()
and properly specify columns to remove after dataset transformation.

* Update tests/prompt_strategies/test_dpo_chat_templates.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
(cherry picked from commit 87565ec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants