Skip to content

Fix loading logic issue#44095

Merged
Cyrilvallez merged 10 commits intomainfrom
fix-logic-loading
Feb 17, 2026
Merged

Fix loading logic issue#44095
Cyrilvallez merged 10 commits intomainfrom
fix-logic-loading

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Feb 17, 2026

What does this PR do?

As per the title. The check that was added in #43768 is wrong, as a missing weight would NOT be reinitialized in some cases!

As for the pointers check, it is actually not needed at all since when modules (modules, not parameters) are shared (tied), loading only 1 param from the checkpoints will load both weights correctly as it calls setattr on the module which is the object shared already. So we simply have to remove them from missing keys, instead of an expensive (and sometimes wrong) check of the pointers.
This change fixes #44060

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that works much better tbh

Comment on lines +2332 to +2338
# This check is for remote code that does NOT use either `torch.init` or `transformers.initialization` in `_init_weights`
# which allow to check the flag directly on param. As they don't and write the params in-place, params would be reinitialized
# otherwise
if all(getattr(param, "_is_hf_initialized", False) for param in module.parameters(recurse=False)) and all(
getattr(buffer, "_is_hf_initialized", False)
for buffer in module.buffers(recurse=False)
if buffer is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, forgot that we can get bigger modules

@Cyrilvallez Cyrilvallez merged commit a64996e into main Feb 17, 2026
26 checks passed
@Cyrilvallez Cyrilvallez deleted the fix-logic-loading branch February 17, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3-Next: Incorrect tied weights warning ties embed_tokens.weight to linear_attn.dt_bias across all layers

3 participants