🚨 Delete deprecations with end-cycle in v4.xx and v5.0#41681
🚨 Delete deprecations with end-cycle in v4.xx and v5.0#41681zucchini-nlp merged 19 commits intohuggingface:mainfrom
Conversation
yonigozlan
left a comment
There was a problem hiding this comment.
Love this so much 🥲, thanks for taking the time to do this @zucchini-nlp !
My only reservation is for max_size, I'll try to scan the hub to see how bad the damage would be right now, and open PRs where necessary
src/transformers/models/conditional_detr/image_processing_conditional_detr.py
Show resolved
Hide resolved
src/transformers/models/deformable_detr/image_processing_deformable_detr.py
Show resolved
Hide resolved
src/transformers/models/maskformer/image_processing_maskformer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Thanks a lot for taking care of that!
General comment:
For every models that had something like
logger.warning_once(
"MimiModel is using MimiSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
)because of being copied from Gemma, I think we should keep a warning. I've seen your answer above about that, but this is only when output_attentions is setted in the config, not in the forward, no? Therefore, until those implementations rely on ALL_ATTENTION_FUNCTIONS, we still have to handle the warning here to ensure we don't silently skip outputting attentions.
Review from my side:
core models (meaning ones that do not rely on modular)
- moshi
- mimi
- clap
- seamless_m4t
- unispeech
- unispeech_sat
- wav2vec2
- wav2vec2_bert
- whisper
the rest are impacted by modular directly:
- hubert (wav2vec2)
- kyutai_speech_to_text (moshi)
- sew (wav2vec2)
- sew_d (wav2vec2)
- wavlm (wav2vec2)
- data2vec_audio (wav2vec2)
Ah, after looking I see that it's not going to raise any warning unless we use Also it will nudge us to incorporate |
|
Yep, totally agree! |
| self, | ||
| input_ids: Optional[torch.LongTensor] = None, | ||
| attention_mask: Optional[torch.FloatTensor] = None, | ||
| position_ids: Optional[torch.LongTensor] = None, |
There was a problem hiding this comment.
model doesn't use positions, I believe it was simply copied from other models without checking
|
Hub down? 😢 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Oke, now it is done I think. @yonigozlan @eustlb if you want to take another look, I will also request core mantainer's review |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Amazing cleanup! Thanks a lot! Just a few conflicts to fix and let's go!
src/transformers/models/rt_detr/image_processing_rt_detr_fast.py
Outdated
Show resolved
Hide resolved
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, altclip, beit, blip_2, bloom, bridgetower, clap, conditional_detr, data2vec, deepseek_v2, deformable_detr, detr, grounding_dino |
…1681) * remove deprecations from v4 * delete those for v5 * delete these also * fix tests * add dummy test config * fix copies * SDPA raises warning but doesn't automatically change to eager * max size can't be deleted, sadly * oke, this should allow loading from-pretrained, but delete everything else * style * fix popping from kwargs * audios rename * padding defaults to self * modular fix * address comment * style
What does this PR do?
Deletes deprecation for the v5 release. cc @yonigozlan and @eustlb to verify that we can really delete them, since it is mostly deprecated stuff from processors
For core maintainers:
sdpa_attention, but for old models we have to manually duplicate the waning message