Skip to content

Fix init weights in remote code#43768

Merged
zucchini-nlp merged 11 commits intohuggingface:mainfrom
zucchini-nlp:vllm-v5-bump
Feb 10, 2026
Merged

Fix init weights in remote code#43768
zucchini-nlp merged 11 commits intohuggingface:mainfrom
zucchini-nlp:vllm-v5-bump

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Feb 5, 2026

What does this PR do?

Helps vLLM to bump to v5

Comment on lines 2312 to +2317
if getattr(module, "_is_hf_initialized", False):
return

if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False):
return

Copy link
Member Author

@zucchini-nlp zucchini-nlp Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module's never have an _is_hf_initialized attr, ig this is a typo? Otherwise it causes the whole model to be random init when remote code has an old-format _init_weights defined and it takes ages for big models

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right they never do now, only the tensors have them.
The check should only run for remote code I think no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for local models it will not have much effect because we check if weight._is_hf_initialized later in the initialization.py. So we are never random init weights for already loaded params

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what is this check? Any module with both a weight and something else would be skipped if the "something else" is a missing param!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modules WILL have the flag when weight init is not called from from_pretrained. Image doing raw instantiation with a composite model, such as model = ModelArch(config), then every submodel will run post_init and initialize its weights, and set the flag, so that next post_init does not run it again!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was due to remote code models re-init weights randomly. For ex, this one has a custom _init_weights and will random init without checking if weights were loaded from ckpt or not. Or do we not support this type of models on purpose in v5 in which case we can revert and skip these models in vLLM?

https://huggingface.co/TIGER-Lab/VLM2Vec-Full/blob/main/modeling_phi3_v.py#L1237-L1247

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v5 broke most of the loading of remote code. For this model in particular, note how e.g. the rope module would have random weights anyway as the non-persistent buffer is not reinitialized explicitly and so would lose its value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually they initialize it at None and then update it, so would be fine in this case

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines +434 to +436
# 5. Special tokens mask configuration
# Patterns: "none", "cls_sep", "eos", "bos", "bos_eos", "cls_double_sep", "prefix_suffix"
self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "cls_sep")
self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "bos_eos")
Copy link
Member Author

@zucchini-nlp zucchini-nlp Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @itazap @ArthurZucker , i want to clarify this part. Should we default to None because cls-sep ids arent always available for all tokenizers. Thus we are getting [None, 1, 18001, 468, None] as token ids for those models

Ignore the current change to bos_eos

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a few tests IMO (especially for the tokenizer fix)

Comment on lines 2312 to +2317
if getattr(module, "_is_hf_initialized", False):
return

if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False):
return

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right they never do now, only the tensors have them.
The check should only run for remote code I think no?

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_5_omni

@zucchini-nlp
Copy link
Member Author

run-slow: llama, mixtral. whisper, bart, mamba, gemma3n, qwen2_vl, llava

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/gemma3n", "models/llama", "models/llava", "models/mamba", "models/qwen2_vl", "models/whisper"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 255c6bc4 merge commit
PR dd6296c2 branch commit
main b7b9d252 base commit

✅ No failing test specific to this PR 🎉 👏 !

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) February 10, 2026 10:48
@zucchini-nlp zucchini-nlp merged commit 884749a into huggingface:main Feb 10, 2026
26 checks passed
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
* init or tie weight in remote code

* processing

* config attr

* maybe? the special token logic is breaking many tests

* updates

* oh c'mon

* omg

* try None and see if tests fail

* oops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants