Fix init weights in remote code#43768
Conversation
| if getattr(module, "_is_hf_initialized", False): | ||
| return | ||
|
|
||
| if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False): | ||
| return | ||
|
|
There was a problem hiding this comment.
module's never have an _is_hf_initialized attr, ig this is a typo? Otherwise it causes the whole model to be random init when remote code has an old-format _init_weights defined and it takes ages for big models
There was a problem hiding this comment.
you are right they never do now, only the tensors have them.
The check should only run for remote code I think no?
There was a problem hiding this comment.
yes, for local models it will not have much effect because we check if weight._is_hf_initialized later in the initialization.py. So we are never random init weights for already loaded params
There was a problem hiding this comment.
Wait, what is this check? Any module with both a weight and something else would be skipped if the "something else" is a missing param!
There was a problem hiding this comment.
Modules WILL have the flag when weight init is not called from from_pretrained. Image doing raw instantiation with a composite model, such as model = ModelArch(config), then every submodel will run post_init and initialize its weights, and set the flag, so that next post_init does not run it again!
There was a problem hiding this comment.
It was due to remote code models re-init weights randomly. For ex, this one has a custom _init_weights and will random init without checking if weights were loaded from ckpt or not. Or do we not support this type of models on purpose in v5 in which case we can revert and skip these models in vLLM?
https://huggingface.co/TIGER-Lab/VLM2Vec-Full/blob/main/modeling_phi3_v.py#L1237-L1247
There was a problem hiding this comment.
v5 broke most of the loading of remote code. For this model in particular, note how e.g. the rope module would have random weights anyway as the non-persistent buffer is not reinitialized explicitly and so would lose its value
There was a problem hiding this comment.
Oh actually they initialize it at None and then update it, so would be fine in this case
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # 5. Special tokens mask configuration | ||
| # Patterns: "none", "cls_sep", "eos", "bos", "bos_eos", "cls_double_sep", "prefix_suffix" | ||
| self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "cls_sep") | ||
| self.special_tokens_pattern = kwargs.pop("special_tokens_pattern", "bos_eos") |
There was a problem hiding this comment.
cc @itazap @ArthurZucker , i want to clarify this part. Should we default to None because cls-sep ids arent always available for all tokenizers. Thus we are getting [None, 1, 18001, 468, None] as token ids for those models
Ignore the current change to bos_eos
ArthurZucker
left a comment
There was a problem hiding this comment.
missing a few tests IMO (especially for the tokenizer fix)
| if getattr(module, "_is_hf_initialized", False): | ||
| return | ||
|
|
||
| if (weight := getattr(module, "weight", None)) is not None and getattr(weight, "_is_hf_initialized", False): | ||
| return | ||
|
|
There was a problem hiding this comment.
you are right they never do now, only the tensors have them.
The check should only run for remote code I think no?
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen2_5_omni |
|
run-slow: llama, mixtral. whisper, bart, mamba, gemma3n, qwen2_vl, llava |
|
This comment contains models: ["models/bart", "models/gemma3n", "models/llama", "models/llava", "models/mamba", "models/qwen2_vl", "models/whisper"] |
* init or tie weight in remote code * processing * config attr * maybe? the special token logic is breaking many tests * updates * oh c'mon * omg * try None and see if tests fail * oops
What does this PR do?
Helps vLLM to bump to v5