Fix `add_response_schema` for VLM processors by qgallouedec · Pull Request #5520 · huggingface/trl

qgallouedec · 2026-04-11T01:38:12Z

PR #5323 added VLM support for parse_response in GRPOTrainer via runtime workarounds: a schema-propagation block in _generate_and_score_completions, a compound has_response_schema check in __init__, and a getattr-unwrap at the decode gate.

The root cause was in add_response_schema: when given a processor, it set response_schema on the outer processor, but parse_response is a tokenizer method that reads self.response_schema from the inner tokenizer. Setting it on the processor only made hasattr checks pass without enabling parsing: #5323 was plastering over this leak.

Fix

add_response_schema now accepts PreTrainedTokenizer | ProcessorMixin and unwraps to .tokenizer before setting the schema, mirroring parse_response on the read side.

With the schema set where it's needed, the workarounds from #5323 collapse:

_generate_and_score_completions: propagation block removed, decode gate simplifies to a single tokenizer check
__init__: has_response_schema becomes a single getattr on the unwrapped tokenizer
isinstance(..., PreTrainedTokenizerBase) exclusion at the decode gate removed — works uniformly for tokenizers and processors

Same simplification applied to experimental DPPOTrainer for consistency.

Tests

TestAddResponseSchema split into test_add_response_schema (LLMs, AutoTokenizer) and test_add_response_schema_vlm (VLMs, AutoProcessor). The VLM test asserts the schema lands on the inner tokenizer.
TestParseResponse gains a _load helper that dispatches to AutoTokenizer / AutoProcessor based on model name (*ForCausalLM vs *ForConditionalGeneration) and sets self.is_vlm. VLM tokenization goes through prepare_multimodal_messages to structure content, and all apply_chat_template calls are normalized to tokenize=True, return_dict=True.

Note

Medium Risk
Touches response parsing and tool-calling decode paths used during GRPO/DPPO rollouts; mistakes could break tool-call extraction or completion decoding for multimodal processors.

Overview
Fixes add_response_schema to accept either a tokenizer or a VLM ProcessorMixin and always set response_schema on the underlying tokenizer (while matching templates against the processor’s top-level chat_template). parse_response/call sites are updated to treat processors and tokenizers uniformly.

Simplifies GRPO/DPPO completion decoding gates by checking response_schema only on the unwrapped tokenizer (removing prior processor-specific workarounds), and expands tests to cover VLM processors via AutoProcessor, including multimodal message preparation and consistent apply_chat_template(tokenize=True, return_dict=True) usage.

^{Reviewed by Cursor Bugbot for commit 90153c2. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-11T01:40:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7561aecd98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 9f41214. Configure here.}

cursor · 2026-04-15T13:26:31Z

    """
    # VLM processors don't have parse_response directly; use the inner tokenizer
-    tokenizer = getattr(tokenizer_or_processor, "tokenizer", tokenizer_or_processor)
+    tokenizer = getattr(processing_class, "tokenizer", processing_class)


Inconsistent getattr dispatch vs isinstance in same file

Low Severity

parse_response uses getattr(processing_class, "tokenizer", processing_class) for processor-vs-tokenizer dispatch, while add_response_schema in the same file now uses the cleaner isinstance(processing_class, ProcessorMixin) check for the exact same pattern. The project rules say to avoid hasattr and getattr when a cleaner alternative exists — and this PR already demonstrates that alternative a few hundred lines above.

Additional Locations (1)

trl/chat_template_utils.py#L326-L330

^{Triggered by project rule: ../.ai/AGENTS.md}

^{Reviewed by Cursor Bugbot for commit 9f41214. Configure here.}

albertvillanova

Thanks.

albertvillanova · 2026-04-16T10:49:40Z

+    # For VLM processors, set the schema on the inner tokenizer (where `parse_response` reads it from).
+    # Match against the top-level chat_template, since that's what was used historically and processors
+    # may carry their own VLM-specific template separate from the inner tokenizer's.
+    chat_template = processing_class.chat_template


I think I understand now! This is the fixing change!

qgallouedec added 2 commits April 11, 2026 01:37

Fix add_response_schema for VLM processors

7561aec

rm llama

be9e232

chatgpt-codex-connector Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread trl/chat_template_utils.py Outdated

handle empty content for vlm

778432e

cursor Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread trl/experimental/dppo/dppo_trainer.py Outdated

qgallouedec added 2 commits April 11, 2026 01:57

bikeshedding

4c62f60

type hint

ecc41e7

cursor Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread trl/experimental/dppo/dppo_trainer.py Outdated

qgallouedec and others added 2 commits April 11, 2026 02:05

alignement

8cb5f58

Merge branch 'main' into add_response_schema_vlm

21dc0fe

qgallouedec requested review from AmineDiro, albertvillanova and kashif April 14, 2026 02:34

Merge branch 'main' into add_response_schema_vlm

e311e06

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py

qgallouedec and others added 3 commits April 14, 2026 16:29

Merge branch 'main' into add_response_schema_vlm

fece07f

fix merge main

831d662

Merge branch 'main' into add_response_schema_vlm

9f41214

cursor Bot reviewed Apr 15, 2026

View reviewed changes

Merge branch 'main' into add_response_schema_vlm

90153c2

qgallouedec mentioned this pull request Apr 15, 2026

Set _tokenizer as trainer attribute #5489

Merged

albertvillanova approved these changes Apr 16, 2026

View reviewed changes

qgallouedec merged commit 811ff6f into main Apr 16, 2026
15 checks passed

qgallouedec deleted the add_response_schema_vlm branch April 16, 2026 12:39

albertvillanova mentioned this pull request Apr 17, 2026

[docs] Add chat templates page to web docs #5581

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `add_response_schema` for VLM processors#5520

Fix `add_response_schema` for VLM processors#5520
qgallouedec merged 12 commits into
mainfrom
add_response_schema_vlm

qgallouedec commented Apr 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 15, 2026

Uh oh!

albertvillanova left a comment

Uh oh!

albertvillanova Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qgallouedec commented Apr 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix

Tests

Uh oh!

HuggingFaceDocBuilderDev commented Apr 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 15, 2026

Choose a reason for hiding this comment

Inconsistent getattr dispatch vs isinstance in same file

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Apr 11, 2026 •

edited by cursor Bot

Loading

Inconsistent `getattr` dispatch vs `isinstance` in same file