Fix init weights in remote code by zucchini-nlp · Pull Request #43768 · huggingface/transformers

zucchini-nlp · 2026-02-05T14:04:02Z

What does this PR do?

Helps vLLM to bump to v5

zucchini-nlp · 2026-02-05T14:14:32Z

src/transformers/modeling_utils.py

        if getattr(module, "_is_hf_initialized", False):
            return

+        if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False):
+            return
+


module's never have an _is_hf_initialized attr, ig this is a typo? Otherwise it causes the whole model to be random init when remote code has an old-format _init_weights defined and it takes ages for big models

you are right they never do now, only the tensors have them.
The check should only run for remote code I think no?

yes, for local models it will not have much effect because we check if weight._is_hf_initialized later in the initialization.py. So we are never random init weights for already loaded params

Wait, what is this check? Any module with both a weight and something else would be skipped if the "something else" is a missing param!

Modules WILL have the flag when weight init is not called from from_pretrained. Image doing raw instantiation with a composite model, such as model = ModelArch(config), then every submodel will run post_init and initialize its weights, and set the flag, so that next post_init does not run it again!

It was due to remote code models re-init weights randomly. For ex, this one has a custom _init_weights and will random init without checking if weights were loaded from ckpt or not. Or do we not support this type of models on purpose in v5 in which case we can revert and skip these models in vLLM?

https://huggingface.co/TIGER-Lab/VLM2Vec-Full/blob/main/modeling_phi3_v.py#L1237-L1247

v5 broke most of the loading of remote code. For this model in particular, note how e.g. the rope module would have random weights anyway as the non-persistent buffer is not reinitialized explicitly and so would lose its value

Oh actually they initialize it at None and then update it, so would be fine in this case

HuggingFaceDocBuilderDev · 2026-02-05T14:15:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/modeling_utils.py

zucchini-nlp · 2026-02-06T14:02:06Z

src/transformers/tokenization_python.py

        # 5. Special tokens mask configuration
        # Patterns: "none", "cls_sep", "eos", "bos", "bos_eos", "cls_double_sep", "prefix_suffix"
-        self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "cls_sep")
+        self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "bos_eos")


cc @itazap @ArthurZucker , i want to clarify this part. Should we default to None because cls-sep ids arent always available for all tokenizers. Thus we are getting [None, 1, 18001, 468, None] as token ids for those models

Ignore the current change to bos_eos

ArthurZucker

missing a few tests IMO (especially for the tokenizer fix)

ArthurZucker · 2026-02-09T15:54:08Z

src/transformers/modeling_utils.py

        if getattr(module, "_is_hf_initialized", False):
            return

+        if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False):
+            return
+


you are right they never do now, only the tensors have them.
The check should only run for remote code I think no?

github-actions · 2026-02-10T10:11:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_5_omni

zucchini-nlp · 2026-02-10T10:12:26Z

run-slow: llama, mixtral. whisper, bart, mamba, gemma3n, qwen2_vl, llava

github-actions · 2026-02-10T10:13:42Z

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/gemma3n", "models/llama", "models/llava", "models/mamba", "models/qwen2_vl", "models/whisper"]
quantizations: []

github-actions · 2026-02-10T10:41:20Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	255c6bc4	merge commit
PR	dd6296c2	branch commit
main	b7b9d252	base commit

✅ No failing test specific to this PR 🎉 👏 !

* init or tie weight in remote code * processing * config attr * maybe? the special token logic is breaking many tests * updates * oh c'mon * omg * try None and see if tests fail * oops

init or tie weight in remote code

bc132eb

zucchini-nlp commented Feb 5, 2026

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

zucchini-nlp added 7 commits February 5, 2026 16:06

processing

b05a477

config attr

9726c6f

maybe? the special token logic is breaking many tests

0718e59

updates

c8e77c1

Merge branch 'main' into vllm-v5-bump

2cf6aa9

oh c'mon

e54a6fb

omg

195a552

zucchini-nlp commented Feb 6, 2026

View reviewed changes

try None and see if tests fail

9dccede

hmellor mentioned this pull request Feb 6, 2026

[Feature]: Support transformers>=5 vllm-project/vllm#30466

Open

1 task

oops

1db3144

zucchini-nlp requested a review from ArthurZucker February 9, 2026 09:54

ArthurZucker approved these changes Feb 9, 2026

View reviewed changes

Merge branch 'main' into vllm-v5-bump

dd6296c

zucchini-nlp enabled auto-merge (squash) February 10, 2026 10:48

hmellor mentioned this pull request Feb 10, 2026

Update to transformers v5 vllm-project/vllm#30566

Open

zucchini-nlp merged commit 884749a into huggingface:main Feb 10, 2026
26 checks passed

piyifan123 mentioned this pull request Feb 10, 2026

Support transformers v5 ByteDance-Seed/VeOmni#468

Open

Cyrilvallez mentioned this pull request Feb 17, 2026

Fix loading logic issue #44095

Merged

Conversation

zucchini-nlp commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

zucchini-nlp Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 5, 2026

Uh oh!

Uh oh!

zucchini-nlp Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

zucchini-nlp commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

CI Results

Commit Info

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp commented Feb 5, 2026 •

edited

Loading

zucchini-nlp Feb 5, 2026 •

edited

Loading

zucchini-nlp Feb 6, 2026 •

edited

Loading