Skip to content

Fix add_response_schema for VLM processors#5520

Merged
qgallouedec merged 12 commits into
mainfrom
add_response_schema_vlm
Apr 16, 2026
Merged

Fix add_response_schema for VLM processors#5520
qgallouedec merged 12 commits into
mainfrom
add_response_schema_vlm

Conversation

@qgallouedec

@qgallouedec qgallouedec commented Apr 11, 2026

Copy link
Copy Markdown
Member

PR #5323 added VLM support for parse_response in GRPOTrainer via runtime workarounds: a schema-propagation block in _generate_and_score_completions, a compound has_response_schema check in __init__, and a getattr-unwrap at the decode gate.

The root cause was in add_response_schema: when given a processor, it set response_schema on the outer processor, but parse_response is a tokenizer method that reads self.response_schema from the inner tokenizer. Setting it on the processor only made hasattr checks pass without enabling parsing: #5323 was plastering over this leak.

Fix

add_response_schema now accepts PreTrainedTokenizer | ProcessorMixin and unwraps to .tokenizer before setting the schema, mirroring parse_response on the read side.

With the schema set where it's needed, the workarounds from #5323 collapse:

  • _generate_and_score_completions: propagation block removed, decode gate simplifies to a single tokenizer check
  • __init__: has_response_schema becomes a single getattr on the unwrapped tokenizer
  • isinstance(..., PreTrainedTokenizerBase) exclusion at the decode gate removed — works uniformly for tokenizers and processors

Same simplification applied to experimental DPPOTrainer for consistency.

Tests

  • TestAddResponseSchema split into test_add_response_schema (LLMs, AutoTokenizer) and test_add_response_schema_vlm (VLMs, AutoProcessor). The VLM test asserts the schema lands on the inner tokenizer.
  • TestParseResponse gains a _load helper that dispatches to AutoTokenizer / AutoProcessor based on model name (*ForCausalLM vs *ForConditionalGeneration) and sets self.is_vlm. VLM tokenization goes through prepare_multimodal_messages to structure content, and all apply_chat_template calls are normalized to tokenize=True, return_dict=True.

Note

Medium Risk
Touches response parsing and tool-calling decode paths used during GRPO/DPPO rollouts; mistakes could break tool-call extraction or completion decoding for multimodal processors.

Overview
Fixes add_response_schema to accept either a tokenizer or a VLM ProcessorMixin and always set response_schema on the underlying tokenizer (while matching templates against the processor’s top-level chat_template). parse_response/call sites are updated to treat processors and tokenizers uniformly.

Simplifies GRPO/DPPO completion decoding gates by checking response_schema only on the unwrapped tokenizer (removing prior processor-specific workarounds), and expands tests to cover VLM processors via AutoProcessor, including multimodal message preparation and consistent apply_chat_template(tokenize=True, return_dict=True) usage.

Reviewed by Cursor Bugbot for commit 90153c2. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7561aecd98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread trl/chat_template_utils.py Outdated
Comment thread trl/experimental/dppo/dppo_trainer.py Outdated
Comment thread trl/experimental/dppo/dppo_trainer.py Outdated
Comment thread trl/trainer/grpo_trainer.py

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9f41214. Configure here.

"""
# VLM processors don't have parse_response directly; use the inner tokenizer
tokenizer = getattr(tokenizer_or_processor, "tokenizer", tokenizer_or_processor)
tokenizer = getattr(processing_class, "tokenizer", processing_class)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent getattr dispatch vs isinstance in same file

Low Severity

parse_response uses getattr(processing_class, "tokenizer", processing_class) for processor-vs-tokenizer dispatch, while add_response_schema in the same file now uses the cleaner isinstance(processing_class, ProcessorMixin) check for the exact same pattern. The project rules say to avoid hasattr and getattr when a cleaner alternative exists — and this PR already demonstrates that alternative a few hundred lines above.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by project rule: ../.ai/AGENTS.md

Reviewed by Cursor Bugbot for commit 9f41214. Configure here.

@albertvillanova albertvillanova left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

# For VLM processors, set the schema on the inner tokenizer (where `parse_response` reads it from).
# Match against the top-level chat_template, since that's what was used historically and processors
# may carry their own VLM-specific template separate from the inner tokenizer's.
chat_template = processing_class.chat_template

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand now! This is the fixing change!

@qgallouedec qgallouedec merged commit 811ff6f into main Apr 16, 2026
15 checks passed
@qgallouedec qgallouedec deleted the add_response_schema_vlm branch April 16, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants