Skip to content

Restrict VLM padding workaround to transformers 5.3.0#5503

Merged
albertvillanova merged 1 commit into
huggingface:mainfrom
albertvillanova:fix-5502-1
Apr 10, 2026
Merged

Restrict VLM padding workaround to transformers 5.3.0#5503
albertvillanova merged 1 commit into
huggingface:mainfrom
albertvillanova:fix-5502-1

Conversation

@albertvillanova

@albertvillanova albertvillanova commented Apr 10, 2026

Copy link
Copy Markdown
Member

Restrict VLM padding workaround to transformers 5.3.0.

This PR updates the prompt tokenization logic in several trainer classes to handle a specific bug in the transformers library more robustly. The main change is to apply a padding workaround only for affected transformers versions (5.3.x), and to simplify the logic for handling padded and unpadded input IDs. This avoids unnecessary padding for unaffected versions and ensures compatibility with future releases.

Additionally, this PR fixes:

Fix #5502.

Motivation

Solution

Changes

Bug workaround and version handling:

  • Added a conditional check for the transformers library version (>=5.3.0 and <5.4.0) to determine if the padding workaround should be applied, instead of always applying padding. (GRPO, RLOO, and experimental DPPO)
  • Updated comments to clarify that the bug is present in transformers 5.3.0 and fixed in 5.4.0, with references to the relevant GitHub issues and PRs.

Prompt ID extraction logic:

  • Only unpads input_ids using the attention_mask when the workaround is needed; otherwise, uses the tokenized input_ids directly, simplifying the code path for unaffected versions.

Note

Medium Risk
Touches prompt tokenization in multiple trainers; version-gated padding/unpadding could change input IDs for some transformers versions and affect multimodal generation edge cases.

Overview
Restricts the VLM apply_chat_template padding workaround to transformers versions >=5.3.0 and <5.4.0, instead of always forcing padding=True.

When the workaround is inactive, the trainers now skip the unpadding step and use tokenized["input_ids"] directly; comments were updated to reference the correct upstream bug/fix (transformers#44514/#44563).

Reviewed by Cursor Bugbot for commit cf3dbe2. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# Workaround for a bug in transformers 5.3.0 where some processors (e.g. Qwen2.5-VL) crash on
# batched unpadded input (transformers#44514).
# Fixed in transformers 5.4.0 (transformers#44563).
needs_padding_workaround = Version("5.3.0") <= Version(transformers.__version__) < Version("5.4.0")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we were wrong, it was introduced in 5.3 not 5.2?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tested locally.

The original comment also incorrectly attributed the bug to transformers 5.2.0

@albertvillanova albertvillanova merged commit ea283c6 into huggingface:main Apr 10, 2026
12 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Warning log: Kwargs passed to processor.__call__ have to be in processor_kwargs dict, not in **kwargs

3 participants