Fix weight tying logic between `_tied_weights_keys` and `tie_word_embeddings` by molbap · Pull Request #42385 · huggingface/transformers

molbap · 2025-11-25T09:55:06Z

What does this PR do?

HuggingFaceDocBuilderDev · 2025-11-25T10:04:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap · 2025-11-25T10:14:17Z

run-slow: mistral

github-actions · 2025-11-25T10:15:29Z

This comment contains run-slow, running the specified jobs:

models: ["models/mistral"]
quantizations: []

github-actions · 2025-11-25T10:29:38Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

molbap · 2025-11-25T11:05:50Z

run-slow: mistral3

github-actions · 2025-11-25T11:07:00Z

This comment contains run-slow, running the specified jobs:

models: ["models/mistral3"]
quantizations: []

github-actions · 2025-11-25T11:46:20Z

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

mistral3:
tests/models/mistral3/test_modeling_mistral3.py::Mistral3ModelTest::test_config

molbap · 2025-11-25T12:46:47Z

run-slow: mistral, mistral3

github-actions · 2025-11-25T12:47:55Z

This comment contains run-slow, running the specified jobs:

models: ["models/mistral", "models/mistral3"]
quantizations: []

github-actions · 2025-11-25T13:08:26Z

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

mistral3:
tests/models/mistral3/test_modeling_mistral3.py::Mistral3ModelTest::test_config

molbap · 2025-11-25T14:21:31Z

run-slow: olmoe

github-actions · 2025-11-25T14:22:38Z

This comment contains run-slow, running the specified jobs:

models: ["models/olmoe"]
quantizations: []

github-actions · 2025-11-25T14:37:59Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

molbap · 2025-11-25T15:55:55Z

run-slow: olmoe, mistral, mistral3

github-actions · 2025-11-25T15:57:06Z

This comment contains run-slow, running the specified jobs:

models: ["models/mistral", "models/mistral3", "models/olmoe"]
quantizations: []

github-actions · 2025-11-25T16:27:31Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

molbap · 2025-11-25T16:58:02Z

run-slow: olmoe, mistral, mistral3, phi3, starcoder

github-actions · 2025-11-25T16:59:17Z

This comment contains run-slow, running the specified jobs:

models: ["models/mistral", "models/mistral3", "models/olmoe", "models/phi3"]
quantizations: []

molbap · 2025-11-25T17:15:33Z

run-slow: olmoe, mistral, mistral3, phi3, starcoder, bigbird, bigbird_pegasus

github-actions · 2025-11-25T17:15:43Z

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

molbap · 2025-11-26T13:23:02Z

run-slow: olmoe, mistral, mistral3, phi3, starcoder, bigbird, bigbird_pegasus, whisper, internvl, llava, llava_next, llava_next_video, qwen, fsmt, video_llava, deepseek_v3, qwen3_vl_moe

github-actions · 2025-11-26T13:24:19Z

This comment contains run-slow, running the specified jobs:

models: ["models/bigbird_pegasus", "models/deepseek_v3", "models/fsmt", "models/internvl", "models/llava", "models/llava_next", "models/llava_next_video", "models/mistral", "models/mistral3", "models/olmoe", "models/phi3", "models/qwen3_vl_moe", "models/video_llava", "models/whisper"]
quantizations: []

github-actions · 2025-11-26T13:53:37Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

src/transformers/core_model_loading.py

github-actions · 2025-11-27T15:53:07Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt

molbap · 2025-11-27T15:54:49Z

I removed the fallback to parent config in case tie_word_embeddings is not found in the text. I also attempted to add a test, might be a bit overkill, let's see

molbap · 2025-11-27T16:40:49Z

A remaining headscratcher is tie_encoder_decoder which is now redundant. I did a fallback that looks like

should_tie = tie_encoder_decoder if tie_word_embeddings is None else tie_word_embeddings

and it seems to work/keep the key as source of authority. LMK!

github-actions · 2025-11-27T16:42:37Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, musicgen, musicgen_melody

github-actions · 2025-11-27T16:42:55Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, musicgen, musicgen_melody

zucchini-nlp · 2025-11-28T09:21:56Z

A remaining headscratcher is tie_encoder_decoder which is now redundant. I did a fallback that looks like

CMIIW, tie_encoder_decoder means tying all weights from encoder to decoder, i.e. attention and stuff isn't it? In which case

tie_encoder_decoder = config.tie_encoder_decoder if config.is_encoder_decoder else False?

Cyrilvallez

A few comments/answers! Sorry for the delay!

Cyrilvallez · 2025-11-28T10:02:11Z

src/transformers/core_model_loading.py

                mismatch_keys.add((target_name, param_value.shape, ref.shape))
-                module_obj.param_name._is_hf_initialized = False  # Needs to be initialized


Yes, was completely masked by the log_to_misc, along with other issues we have currently on some models. We will need to update the logger to avoid silencing important issues!!

Cyrilvallez · 2025-11-28T10:04:30Z

src/transformers/modeling_utils.py

+        text_config = self.config.get_text_config(decoder=True)
+        if not hasattr(text_config, "tie_word_embeddings"):
+            logger.warning(
+                f"Text config {text_config.__class__.__name__} does not have 'tie_word_embeddings' attribute. "
+                "This may cause issues with weight tying."
+            )
+        tie_word_embeddings = getattr(text_config, "tie_word_embeddings", None)


I don't think all multimodals rely on the text_config here unfortunately, do they? I.e. what we want is that each model decide on its own if it should tie its own weights. So we don't really want to check the text_config...

hmm, it's the opposite of what we discussed above. In that case, what do you suggest?

Well the issue is that if you have a submodel such as self.text_model = AutoModel._from_config(text_config), you cannot know its weights in advance as it can be any model. So that model is the only responsible for its own weight.
We cannot delegate for the top-most model, as this makes it wayyy too hard and not scalable in general

Cyrilvallez · 2025-11-28T10:06:00Z

src/transformers/modeling_utils.py

+            )
+        tie_word_embeddings = getattr(text_config, "tie_word_embeddings", None)
+        tie_encoder_decoder = getattr(self.config, "tie_encoder_decoder", False)
+        should_tie = tie_encoder_decoder if tie_word_embeddings is None else tie_word_embeddings


So we basically disregard tie_encoder_decoder completely? Because from your change is configuration_utils, tie_word_embeddings will NEVER be None anymore if I understand correctly

yeah, I'm reverting this unfortunately, I would indeed like to get rid of tie_encoder_decoder

Cyrilvallez · 2025-11-28T10:10:39Z

src/transformers/modeling_utils.py

+            tied_keys_attr = getattr(self, "all_tied_weights_keys", None)
+            if tied_keys_attr is not None:
+                _tied_weights_keys = set(tied_keys_attr.keys())
+            else:
+                _tied_weights_keys = set(_get_tied_weight_keys(self))


The fact here is that I think some older models were tying using _tie_weights (the one with underscore) that were ALWAYS run, independently of the configs. So they could have tied weights even though the config say to not tie technically. In save_pretrained, we check the pointers of the weight tensors, so we know the tied weights from there and we cannot be wrong about it. But then if we use the good citizen all_tied_weights_keys, which correctly looks at the config, some of those tied weights will not find their source as they are not SUPPOSED to be tied (but they were anyway).

So technically looking at all potential _tied_weights_keys here is not wrong as we check pointers anyway, and this avoids this kind of issues.
But indeed, in the future we want to always know which weights are tied and simply rely on the internal list (or even better, recomputing it with get_expanded_tied_weights_keys(all_submodels=True))

Cyrilvallez · 2025-11-28T10:12:06Z

src/transformers/modeling_utils.py

-        for key in missing_keys - self.all_tied_weights_keys.keys():
+        tied_keys_attr = getattr(self, "all_tied_weights_keys", {}) or {}
+        tied_keys = set(tied_keys_attr.keys())
+        for key in missing_keys - tied_keys:


Why do we have this? all_tied_weights_keys is guaranteed to be a dict already at this point from post_init

ah yes indeed, could be just

tied_keys = set(self.all_tied_weights_keys.keys()) for key in missing_keys - tied_keys: do stuff

will update

No need to cast it again as a set the keys already are a subclass of set!

molbap · 2025-11-28T11:11:23Z

tests/test_modeling_common.py

+            if not hasattr(model_tied, "_tied_weights_keys") or not model_tied._tied_weights_keys:
+                continue


note from huddle: should recurse through all submodels here to get the tied keys

github-actions · 2025-12-01T15:39:23Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, musicgen, musicgen_melody

github-actions · 2025-12-01T15:49:00Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, llava_onevision, musicgen, musicgen_melody

github-actions · 2025-12-01T15:59:44Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, llava_next_video, llava_onevision, musicgen, musicgen_melody

github-actions · 2025-12-01T16:00:00Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fsmt, kyutai_speech_to_text, llava_next_video, llava_onevision, musicgen, musicgen_melody

one fix

47b0e0e

attempt mistral3

70fb402

empty dict

6ce2bc6

that was olmoe's problem

60665c6

molbap added 3 commits November 25, 2025 16:49

Merge branch 'main' into fix_weight_tying

779012a

current CI status?

aa7ab80

actual CI status

afa33c5

simplify

22a258f

hmm?

076f20e

bird

58ec6e6

molbap mentioned this pull request Nov 27, 2025

untie? #42237

Closed

molbap changed the title ~~Various fixes~~ Fix weight tying logic between _tied_weights_keys and tie_word_embeddings Nov 27, 2025

ArthurZucker reviewed Nov 27, 2025

View reviewed changes

src/transformers/core_model_loading.py Outdated Show resolved Hide resolved

molbap added 2 commits November 27, 2025 16:44

finalize

f939769

fixup

9960aa2

molbap added 3 commits November 27, 2025 17:36

weird mamba error

24f0b09

fix tied weights

d2a03d2

hack musicgen

8c4fb33

molbap and others added 2 commits November 27, 2025 17:41

tie_encoder_decoder workaround

c8d8ad0

Merge branch 'main' into fix_weight_tying

589e89c

Cyrilvallez reviewed Nov 28, 2025

View reviewed changes

molbap commented Nov 28, 2025

View reviewed changes

revert unwanted changes

65dca34

hardcode llava onevision

f85d0f4

Cyrilvallez and others added 2 commits December 1, 2025 16:58

more

b5537a3

Merge branch 'main' into fix_weight_tying

587fef4

Cyrilvallez mentioned this pull request Dec 1, 2025

Fix tied_weights for vlms #42523

Merged

		mismatch_keys.add((target_name, param_value.shape, ref.shape))
		module_obj.param_name._is_hf_initialized = False # Needs to be initialized

		if not hasattr(model_tied, "_tied_weights_keys") or not model_tied._tied_weights_keys:
		continue

Conversation

molbap commented Nov 25, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 25, 2025

Uh oh!

molbap commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

CI Results

Uh oh!

molbap commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

CI Results

Model CI Report

❌ Failed tests

Uh oh!

molbap commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

CI Results

Model CI Report

❌ Failed tests

Uh oh!

molbap commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

CI Results

Uh oh!

molbap commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

CI Results

Uh oh!

molbap commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

molbap commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

CI Results

Uh oh!

molbap commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

CI Results

Uh oh!

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

molbap commented Nov 27, 2025

Uh oh!

molbap commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

zucchini-nlp commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Nov 28, 2025 •

edited

Loading

molbap Nov 28, 2025 •

edited

Loading