fix(gguf): Skip lm_head mapping for models with tied word embeddings by kitaekatt · Pull Request #30412 · vllm-project/vllm

kitaekatt · 2025-12-10T17:46:12Z

Summary

Fixes RuntimeError: Failed to map GGUF parameters (1): ['lm_head.weight'] for models using tied word embeddings.

Changes

For models like Gemma2 that use tie_word_embeddings=True, add lm_head.weight to sideload_params to allow GGUF loading to succeed.

Root Cause

When tie_word_embeddings=True, model shares weights between input embeddings and output projection. GGUF files don't contain separate lm_head.weight tensor.

Testing

Tested with bartowski/gemma-2-2b-it-GGUF - model loads without parameter mapping error.

gemini-code-assist

Code Review

This pull request introduces a fix for loading GGUF models with tied word embeddings, such as Gemma2, by correctly skipping the mapping of lm_head.weight. Additionally, it proactively addresses potential data type compatibility issues on newer hardware like NVIDIA's Blackwell GPUs by preventing the use of bfloat16 with GGUF quantization due to precision issues and implementing a mechanism to automatically select a compatible dtype when conflicts arise. The changes are well-implemented and improve the robustness of GGUF model loading. The logic for handling tied embeddings and resolving dtype conflicts is sound. I have no major concerns with this pull request.

mergify · 2025-12-15T20:49:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kitaekatt.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

kitaekatt · 2026-01-21T00:28:06Z

Testing performed:

Tested with GGUF models on RTX 5090 (32GB, Blackwell architecture):

bartowski/NousResearch_Hermes-4-14B-GGUF
tensorblock/Qwen_Qwen3-30B-A3B-Instruct-2507-GGUF

Ran models through a local benchmark runner for HumanEval and GSM8K. Verified the semaphore leak fix by running repeated model load/unload cycles without resource exhaustion.

Related PRs:

This is part of a series of GGUF pipeline fixes for Blackwell GPU compatibility:

fix(gguf): Auto-select compatible dtype for GGUF models on Blackwell #30410: Fix GGUF dtype selection for Blackwell compatibility
fix(gguf): Skip lm_head mapping for models with tied word embeddings #30412 (this PR): Fix tokenizer semaphore leak
fix(gguf): Use EOS token ID from GGUF metadata instead of HF tokenizer #30434: Shared memory barrier fix

For models like Gemma2 that use tie_word_embeddings=True, the lm_head.weight is initialized from embed_tokens weights rather than loaded separately. Add lm_head.weight to sideload_params to allow GGUF loading to succeed without requiring this parameter to be mapped. Fixes: RuntimeError: Failed to map GGUF parameters (1): ['lm_head.weight'] Signed-off-by: Christina <christina@example.com> Signed-off-by: Christina <truffle@gmail.com>

kitaekatt · 2026-03-10T20:10:00Z

Validation Results

vLLM	transformers	Cherry-picked PRs	HumanEval	IFEval
HEAD	5.x	#30410, #30411, #30412, #30413, #30424, #30434, #30699, #30702, #31464, #33846	gem2-2b-gguf (42.1%), gemma3-1b (26.8%)	gem2-2b-gguf (65.6%)
HEAD	4.x	#30410, #30411, #30412, #30413, #30424, #30434, #30699, #30702, #31464, #33846	q3-moe-gguf (83.5%)	q3-moe-gguf (85.4%)

Tested on RTX 5090 (Blackwell, SM 120) with all listed PRs cherry-picked together; models listed under each benchmark passed that benchmark in the given environment, while the same models crash or fail without these PRs applied.

Rebased to current upstream HEAD and re-validated on RTX 5090 (Blackwell, SM 120). Fix confirmed still necessary — gem2-2b-gguf and gemma3-1b (both with tied embeddings) crash without it.

…ddings

kitaekatt · 2026-04-18T21:28:22Z

Hi @22quinn @Isotr0py — this has been sitting without review for a while. Just rebased on latest upstream/main (merge commit), so the branch should now be mergeable. Quick summary: Skip lm_head mapping for models with tied word embeddings:small fix in gguf_loader.py — tied-embedding models were failing when lm_head mapping was attempted. Rebased on latest main.

Would appreciate a look when you have cycles. Thanks!

mergify Bot mentioned this pull request Dec 10, 2025

fix(gguf): Skip lm_head mapping for models with tied word embeddings #30405

Closed

gemini-code-assist Bot reviewed Dec 10, 2025

View reviewed changes

This was referenced Dec 15, 2025

fix(gemma2): Skip missing parameters during GGUF weight loading #30421

Closed

fix(gguf): GGUF model support fixes for Blackwell GPUs #30497

Closed

kitaekatt force-pushed the fix/30405-tied-embeddings branch from 9512f74 to a195d52 Compare December 15, 2025 20:48

mergify Bot added the needs-rebase label Dec 15, 2025

kitaekatt force-pushed the fix/30405-tied-embeddings branch from a195d52 to a215a08 Compare December 29, 2025 20:42

mergify Bot removed the needs-rebase label Dec 29, 2025

kitaekatt force-pushed the fix/30405-tied-embeddings branch from a215a08 to 7146180 Compare January 19, 2026 17:27

This was referenced Jan 21, 2026

fix(gguf): Auto-select compatible dtype for GGUF models on Blackwell #30410

Open

fix(gguf): Use EOS token ID from GGUF metadata instead of HF tokenizer #30434

Closed

kitaekatt marked this pull request as ready for review January 21, 2026 00:28

kitaekatt requested a review from 22quinn as a code owner January 21, 2026 00:28

kitaekatt force-pushed the fix/30405-tied-embeddings branch from 1679948 to 9d36909 Compare February 5, 2026 00:18

kitaekatt force-pushed the fix/30405-tied-embeddings branch from 9d36909 to c414c02 Compare March 10, 2026 15:38

kitaekatt mentioned this pull request Mar 10, 2026

[Bugfix] Skip missing parameters during GGUF Gemma2 weight loading #30699

Closed

Merge remote-tracking branch 'upstream/main' into fix/30405-tied-embe…

7c48b13

…ddings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gguf): Skip lm_head mapping for models with tied word embeddings#30412

fix(gguf): Skip lm_head mapping for models with tied word embeddings#30412
kitaekatt wants to merge 2 commits into
vllm-project:mainfrom
kitaekatt:fix/30405-tied-embeddings

kitaekatt commented Dec 10, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mergify Bot commented Dec 15, 2025

Uh oh!

kitaekatt commented Jan 21, 2026

Uh oh!

kitaekatt commented Mar 10, 2026

Uh oh!

kitaekatt commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kitaekatt commented Dec 10, 2025

Summary

Changes

Root Cause

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify Bot commented Dec 15, 2025

Uh oh!

kitaekatt commented Jan 21, 2026

Uh oh!

kitaekatt commented Mar 10, 2026

Validation Results

Uh oh!

kitaekatt commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant