Fix/pe audio video bugs#45886
Merged
zucchini-nlp merged 6 commits intoMay 12, 2026
Merged
Conversation
Comment on lines
+18
to
+19
| def __init__(self, feature_extractor=None, video_processor=None, tokenizer=None, **kwargs): | ||
| super().__init__(feature_extractor, video_processor, tokenizer, **kwargs) |
Member
|
run-slow: pe_audio_video |
Contributor
|
This comment contains models: ["models/pe_audio_video"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
stevhliu
reviewed
May 11, 2026
stevhliu
left a comment
Member
There was a problem hiding this comment.
nice, the docs will indeed be handled in the other PR! good to merge once that's dropped here :)
Contributor
Author
ok, I'll revert the doc change then! thanks |
This reverts commit 9c81780.
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: pe_audio_video |
jp1924
pushed a commit
to jp1924/transformers
that referenced
this pull request
May 18, 2026
* Register correct mapping * set output_hidden_states in get_text_*_embeds helpers * point to get_*_embeds in forward error message * Populate PE AV documentation * Revert "Populate PE AV documentation" This reverts commit 9c81780.
khushali9
pushed a commit
to khushali9/transformers
that referenced
this pull request
Jun 8, 2026
* Register correct mapping * set output_hidden_states in get_text_*_embeds helpers * point to get_*_embeds in forward error message * Populate PE AV documentation * Revert "Populate PE AV documentation" This reverts commit 9c81780.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
1. Migrate PE-AV processor to the v5 sub-processor API
PeAudioVideoProcessorstill uses the legacyfeature_extractor_classandvideo_processor_classthat #41633 deprecated, so every checkpoint load prints two deprecation warnings. The Auto mappings are already registered, so we can drop the legacy attrs and add an explicit__init__.2. Fix
get_*_embedscrashThe
get_text_*_embedshelpers call the text model withoutoutput_hidden_states=Trueand then accesstext_outputs.hidden_states[-1], which isNone, and so they crash withTypeError.The fix is one extra kwarg per helper, mirroring the
forwardbehaviour.3. Friendlier
forwarderror for single-modality inputsforwardrequires ≥2 modalities; single modalities are handled by theget_*_embedshelpers.Extended the
ValueErrorto mention the existence of those helpers, so users reading the "you can omit any of the modalities, and use the same forward method" in the model card aren't stuck.4. Fill in
docs/source/en/model_doc/pe_audio_video.mdReplaced with a short overview, the clean architecture figure (upload pending here https://huggingface.co/datasets/huggingface/documentation-images/discussions/614), and a link to the well-documented PE-AV collection for checkpoints and end-to-end usage.
I noticed #45612 already proposes a doc fill-in for this page. Happy to drop/adapt this commit if the existing PR is preferred, but the other three fixes are independent of the doc change.
Testing
Run
facebook/pe-av-smallwith the minimal code below.Code Agent Policy
Who can review?
First time so don't hate my if I tag wrong people @zucchini-nlp @stevhliu XD