Skip to content

Conversation

@molbap
Copy link
Contributor

@molbap molbap commented Jul 10, 2025

What does this PR do?

TL;DR

PreTrainedModel now mixes in EmbeddingAccessMixin providing default
get_input_embeddings / set_input_embeddings / get_output_embeddings / set_output_embeddings for all models. These methods should be removed from the codebase unless exceptionally weird.

Details

Uses a new attribute __input_embed_layer = "embed_tokens" by default, and one can change it to set which layer the embeddings should be gotten/set to. Then, assuming embed_tokens is that layer for instance, resolution order is

  • self.model.embed_tokens
  • self.embed_tokens
  • delegate once via base_model_prefix
  • else raise (model must override)

get_output_embeddings now auto-returns lm_head only if input embeddings
resolve (so pure audio/vision backbones still return None).

What you usually have to do: nothing.

Override only if:

  • embeddings live under a different attribute
  • multiple embedding tables
  • you need to hide an lm_head (override get_output_embeddings retunring None)
  • helper head without token embeddings but still wants tying (override and return the head)

Potential breakages:

  • Exotic layouts with custom embedding attr names (must override)

cc @vasqu @zucchini-nlp, minor but to remember for composite models too

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boom! unbloating at its best! let's get htis merged!

@molbap
Copy link
Contributor Author

molbap commented Jul 17, 2025

Down to a few tests failing because of audio embeddings exceptions mostly, solving, then adding set_ and get_ encoder methods and merging this

@molbap molbap changed the title [WIP] Refacto Refactor embedding input/output getter/setter Jul 21, 2025
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit but good to go otherwise!

Comment on lines 891 to 893
def get_output_embeddings(self):
return None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.lm_head is the output embedding no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the answer is annoying 😬 the input embeddings are defined as the conv2d patch layer

def get_input_embeddings(self):
    return self.embeddings.patch_embeddings

so if get_output_embeddings stays defined, since in the default config this param defaults to True

tie_word_embeddings (`bool`, *optional*, defaults to `True`):

then an attempt at tying the weights is always done at init

        if getattr(self.config.get_text_config(decoder=True), "tie_word_embeddings", True):
            output_embeddings = self.get_output_embeddings()
            if output_embeddings is not None:
                self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())

which will fail because conv2 patches have no weight attribute. WDYT of reworking/removing this part from tie_weights?
simpler solution is enforcing return None which is what I have done here (and same for the other comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None works, needs a comment!

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: arcee, aria, aya_vision, bamba, bark, bart, beit, bigbird_pegasus, biogpt, bitnet, blenderbot, blenderbot_small, bloom, chameleon, clvp, codegen

@molbap
Copy link
Contributor Author

molbap commented Jul 21, 2025

run-slow: beit, bark, bart, clvp, glm4, glm4_moe, zamba2

@molbap molbap added the Core: Modeling Internals of the library; Models. label Jul 21, 2025
@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/bark', 'models/bart', 'models/beit', 'models/clvp', 'models/glm4', 'models/glm4_moe', 'models/zamba2']
quantizations: [] ...

@molbap
Copy link
Contributor Author

molbap commented Jul 21, 2025

Tested on a few slow models, failings are identical to main. Merging and keeping an eye out.

@molbap molbap merged commit 69b1582 into main Jul 21, 2025
27 of 28 checks passed
@molbap molbap deleted the draft_VLM_apis branch July 21, 2025 16:18
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jul 22, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core: Modeling Internals of the library; Models.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants