Skip to content

GRPOTrainer/async: fix prefix EOS slicing for tool suffix (with Qwen3/3.5 type of chat templates)#5330

Merged
qgallouedec merged 12 commits into
huggingface:mainfrom
casinca:qwen3.5-gen-fix
Mar 21, 2026
Merged

GRPOTrainer/async: fix prefix EOS slicing for tool suffix (with Qwen3/3.5 type of chat templates)#5330
qgallouedec merged 12 commits into
huggingface:mainfrom
casinca:qwen3.5-gen-fix

Conversation

@casinca

@casinca casinca commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Hi,
this should be a Fix #5317

I went with option 1 (model agnostic imho) mentioned by @mattbui in the issue, and mirrored it to the new exp async GRPO.
Generated a test case, in case.

Before submitting

  • [] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Medium Risk
Changes how tool-result token suffixes are computed in both GRPOTrainer and experimental async GRPO; mistakes here could break tool-calling flows or cause subtle tokenization mismatches for certain chat templates.

Overview
Fixes tool-call prompt reconstruction for chat templates (notably Qwen3/Qwen3.5) that emit an EOS token followed by extra trailing tokens (e.g. newline). When deriving tool-result suffix_ids, the code now trims the prefix to the last EOS before slicing and validates the EOS-trimmed prefix matches the full tokenization, preventing incorrect suffix extraction.

Adds a regression test (test_get_tool_suffix_ids_eos_newline_boundary) covering Qwen3 and Qwen3.5 tokenizers (with version guard) to ensure _get_tool_suffix_ids returns the expected suffix under this EOS+newline boundary condition.

Written by Cursor Bugbot for commit a8e0f3c. This will update automatically on new commits. Configure here.

Comment thread trl/trainer/grpo_trainer.py Outdated
Comment thread trl/trainer/grpo_trainer.py Outdated
Comment thread tests/test_grpo_trainer.py Outdated
casinca and others added 6 commits March 21, 2026 00:28
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Comment thread trl/experimental/async_grpo/async_rollout_worker.py

@qgallouedec qgallouedec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks a lot for the fast fix!

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit 5bb083c into huggingface:main Mar 21, 2026
13 checks passed
@casinca casinca deleted the qwen3.5-gen-fix branch March 21, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing \n token between completion and tool suffix in GRPOTrainer._get_tool_suffix_ids for Qwen3 and Qwen3.5

3 participants