Skip to content

Support VLM processors in is_chat_template_prefix_preserving#5558

Merged
qgallouedec merged 3 commits into
mainfrom
vlm-is-chat-template-prefix-preserving
Apr 17, 2026
Merged

Support VLM processors in is_chat_template_prefix_preserving#5558
qgallouedec merged 3 commits into
mainfrom
vlm-is-chat-template-prefix-preserving

Conversation

@qgallouedec

@qgallouedec qgallouedec commented Apr 15, 2026

Copy link
Copy Markdown
Member

Extend is_chat_template_prefix_preserving to accept VLM processors (not only tokenizers), and update its type hint accordingly.

Context: https://github.com/huggingface/trl/pull/5489/changes#r3087655676

Why

For VLMs, processor.apply_chat_template is not just an alias for processor.tokenizer.apply_chat_template. Checking prefix-preservation on the inner tokenizer can therefore diverge from what actually happens at training time. We want to call the check on the processor whenever one is available.


Note

Low Risk
Small, well-scoped change to a utility check and its tests; main risk is introducing an extra PIL/image dependency path when running the prefix-preservation check for processors.

Overview
is_chat_template_prefix_preserving now accepts either a PreTrainedTokenizer or a VLM ProcessorMixin and runs the prefix check via processing_class.apply_chat_template.

For processors, the check now builds multimodal messages (including a dummy image) via prepare_multimodal_messages so image-token expansion is exercised, and tests add a new require_vision case validating prefix-preservation on a processor template.

Reviewed by Cursor Bugbot for commit fee8f7e. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 355700a58c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_chat_template_utils.py

@albertvillanova albertvillanova left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@qgallouedec qgallouedec merged commit 4595347 into main Apr 17, 2026
12 of 13 checks passed
@qgallouedec qgallouedec deleted the vlm-is-chat-template-prefix-preserving branch April 17, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants