Skip to content

docs: Add "It Takes Two: Your GRPO Is Secretly DPO" paper to GRPOTrainer#5347

Merged
qgallouedec merged 3 commits into
huggingface:mainfrom
DhruvvArora:main
Mar 24, 2026
Merged

docs: Add "It Takes Two: Your GRPO Is Secretly DPO" paper to GRPOTrainer#5347
qgallouedec merged 3 commits into
huggingface:mainfrom
DhruvvArora:main

Conversation

@DhruvvArora

@DhruvvArora DhruvvArora commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds the "It Takes Two: Your GRPO Is Secretly DPO" paper (arXiv: 2510.00977)
to the GRPOTrainer section of the paper index, sorted by publish date (October 2025).

The paper establishes a formal connection between GRPO and DPO, showing that
GRPO's effectiveness stems from an implicit contrastive objective. It introduces
2-GRPO (num_generations=2), which matches 16-GRPO performance at lower training cost —
directly reproducible via GRPOConfig in TRL.

Related to #4374 (Road to v1).

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?

Who can review?

@qgallouedec


Note

Low Risk
Low risk: documentation-only change adding a new paper reference and example config; no runtime or API behavior is modified.

Overview
Adds a new entry to docs/source/paper_index.md under the GRPOTrainer paper index for "It Takes Two: Your GRPO Is Secretly DPO" (2510.00977), including a brief summary and a reproducible GRPOConfig example highlighting num_generations=2 (2-GRPO).

Written by Cursor Bugbot for commit fdb0536. This will update automatically on new commits. Configure here.

Comment thread docs/source/paper_index.md Outdated

@qgallouedec qgallouedec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!
thanks

@qgallouedec qgallouedec changed the title docs: Add "It Takes Two: Your GRPO Is Secretly DPO" paper to GRPOTrai… docs: Add "It Takes Two: Your GRPO Is Secretly DPO" paper to GRPOTrainer Mar 24, 2026
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit 5635466 into huggingface:main Mar 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants