[Bugfix] Fix T5EncoderModel load_weights corrupting key weight names#2344
Open
sfc-gh-goliaro wants to merge 1 commit intovllm-project:mainfrom
Open
[Bugfix] Fix T5EncoderModel load_weights corrupting key weight names#2344sfc-gh-goliaro wants to merge 1 commit intovllm-project:mainfrom
sfc-gh-goliaro wants to merge 1 commit intovllm-project:mainfrom
Conversation
The bare `name.replace(weight_name, param_name)` for the `("qkv_proj",
"k", "k")` mapping entry replaces every occurrence of the substring "k"
in the parameter name — including the "k" in "block" — producing a
corrupted lookup name like `encoder.blocqkv_proj.0.layer...` that
silently fails to match any parameter. As a result, the K-projection
weights in every T5SelfAttention layer are never loaded into the fused
`qkv_proj`, leaving the K shard at its random initialization.
Fix: use dotted replacement (`.k.` → `.qkv_proj.`) so only the
full dotted component is matched, consistent with the guard check
that already uses `f".{weight_name}."`.
The `q` and `v` entries were unaffected because neither letter appears
as a substring in other path components (`encoder`, `block`, `layer`,
etc.), but the fix is applied uniformly for robustness.
Made-with: Cursor
3 tasks
lishunyang12
approved these changes
Apr 2, 2026
Collaborator
lishunyang12
left a comment
There was a problem hiding this comment.
LGTM, good catch. The guard was already using dotted form but the replace wasn't — silent and nasty.
3 tasks
This was referenced Apr 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
T5EncoderModel.load_weightssilently failing to load K-projection weights into the fusedqkv_projlayer due to astr.replacebug that corrupts parameter names.name.replace("k", "qkv_proj")replaces every occurrence of"k"in the name — including the"k"in"block"— producing a corrupted lookup name (encoder.blocqkv_proj.0.layer...) that matches no parameter. As a result, K weights are never loaded..k.→.qkv_proj.) so only the full component is matched, consistent with the guard that already checksf".{weight_name}.".Test plan
q,k,v,wi_0,wi_1) now produce correct lookup namestransformers.T5EncoderModel— cosine similarity improves from ~0.71 to ~0.999 after fixTest Result
transformers.T5EncoderModelrises from ~0.71 to ~0.999 after fixEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)