Skip to content

Align KTO with DPO: Use _metrics attribute#5705

Merged
albertvillanova merged 5 commits into
mainfrom
align-kto-dpo-metrics
May 5, 2026
Merged

Align KTO with DPO: Use _metrics attribute#5705
albertvillanova merged 5 commits into
mainfrom
align-kto-dpo-metrics

Conversation

@albertvillanova

@albertvillanova albertvillanova commented May 5, 2026

Copy link
Copy Markdown
Member

Align KTO with DPO: Use _metrics attribute.

Part of:

This PR refactors how training and evaluation metrics are stored and managed in the KTOTrainer class, replacing the previous _stored_metrics structure with a new _metrics dictionary. This change centralizes metric tracking and simplifies metric handling throughout the class.

Changes

Metrics storage and management:

  • Replaced the _stored_metrics attribute (a nested defaultdict) with a new _metrics dictionary, initialized to separately track "train" and "eval" metrics.
  • Updated all methods (store_metrics, log) to use the new _metrics structure instead of _stored_metrics, ensuring consistent metric accumulation and cleanup.

These changes improve code clarity and reduce the risk of errors when handling training and evaluation metrics.


Note

Low Risk
Low risk refactor limited to how KTOTrainer accumulates and clears train/eval metrics, but it could affect what gets logged if the new containers are mishandled.

Overview
Updates KTOTrainer to use a unified _metrics container (split into train and eval) instead of _stored_metrics, aligning metric bookkeeping with DPO.

Metric accumulation (store_metrics) and log-time aggregation/cleanup (log) are refactored to write into and clear from the new structure, including the chosen/rejected count-based averaging logic.

Reviewed by Cursor Bugbot for commit 881aec3. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c2ec97c. Configure here.

Comment thread trl/experimental/kto/kto_trainer.py Outdated
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@albertvillanova albertvillanova merged commit df6ae2a into main May 5, 2026
5 checks passed
@albertvillanova albertvillanova deleted the align-kto-dpo-metrics branch May 5, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants