Fix add_response_schema for VLM processors#5520
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7561aecd98
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9f41214. Configure here.
| """ | ||
| # VLM processors don't have parse_response directly; use the inner tokenizer | ||
| tokenizer = getattr(tokenizer_or_processor, "tokenizer", tokenizer_or_processor) | ||
| tokenizer = getattr(processing_class, "tokenizer", processing_class) |
There was a problem hiding this comment.
Inconsistent getattr dispatch vs isinstance in same file
Low Severity
parse_response uses getattr(processing_class, "tokenizer", processing_class) for processor-vs-tokenizer dispatch, while add_response_schema in the same file now uses the cleaner isinstance(processing_class, ProcessorMixin) check for the exact same pattern. The project rules say to avoid hasattr and getattr when a cleaner alternative exists — and this PR already demonstrates that alternative a few hundred lines above.
Additional Locations (1)
Triggered by project rule: ../.ai/AGENTS.md
Reviewed by Cursor Bugbot for commit 9f41214. Configure here.
| # For VLM processors, set the schema on the inner tokenizer (where `parse_response` reads it from). | ||
| # Match against the top-level chat_template, since that's what was used historically and processors | ||
| # may carry their own VLM-specific template separate from the inner tokenizer's. | ||
| chat_template = processing_class.chat_template |
There was a problem hiding this comment.
I think I understand now! This is the fixing change!


PR #5323 added VLM support for
parse_responseinGRPOTrainervia runtime workarounds: a schema-propagation block in_generate_and_score_completions, a compoundhas_response_schemacheck in__init__, and agetattr-unwrap at the decode gate.The root cause was in
add_response_schema: when given a processor, it setresponse_schemaon the outer processor, butparse_responseis a tokenizer method that readsself.response_schemafrom the inner tokenizer. Setting it on the processor only madehasattrchecks pass without enabling parsing: #5323 was plastering over this leak.Fix
add_response_schemanow acceptsPreTrainedTokenizer | ProcessorMixinand unwraps to.tokenizerbefore setting the schema, mirroringparse_responseon the read side.With the schema set where it's needed, the workarounds from #5323 collapse:
_generate_and_score_completions: propagation block removed, decode gate simplifies to a single tokenizer check__init__:has_response_schemabecomes a singlegetattron the unwrapped tokenizerisinstance(..., PreTrainedTokenizerBase)exclusion at the decode gate removed — works uniformly for tokenizers and processorsSame simplification applied to experimental
DPPOTrainerfor consistency.Tests
TestAddResponseSchemasplit intotest_add_response_schema(LLMs,AutoTokenizer) andtest_add_response_schema_vlm(VLMs,AutoProcessor). The VLM test asserts the schema lands on the inner tokenizer.TestParseResponsegains a_loadhelper that dispatches toAutoTokenizer/AutoProcessorbased on model name (*ForCausalLMvs*ForConditionalGeneration) and setsself.is_vlm. VLM tokenization goes throughprepare_multimodal_messagesto structure content, and allapply_chat_templatecalls are normalized totokenize=True, return_dict=True.Note
Medium Risk
Touches response parsing and tool-calling decode paths used during GRPO/DPPO rollouts; mistakes could break tool-call extraction or completion decoding for multimodal processors.
Overview
Fixes
add_response_schemato accept either a tokenizer or a VLMProcessorMixinand always setresponse_schemaon the underlying tokenizer (while matching templates against the processor’s top-levelchat_template).parse_response/call sites are updated to treat processors and tokenizers uniformly.Simplifies GRPO/DPPO completion decoding gates by checking
response_schemaonly on the unwrapped tokenizer (removing prior processor-specific workarounds), and expands tests to cover VLM processors viaAutoProcessor, including multimodal message preparation and consistentapply_chat_template(tokenize=True, return_dict=True)usage.Reviewed by Cursor Bugbot for commit 90153c2. Bugbot is set up for automated code reviews on this repo. Configure here.