Skip to content

Make qwen3's set_embed_and_head idempotent#26798

Merged
fzyzcjy merged 1 commit into
mainfrom
tom/pr_chain/tom/kv_canary_revert_reversed/make-qwen3-s-set-embed-and-head-idempotent
May 31, 2026
Merged

Make qwen3's set_embed_and_head idempotent#26798
fzyzcjy merged 1 commit into
mainfrom
tom/pr_chain/tom/kv_canary_revert_reversed/make-qwen3-s-set-embed-and-head-idempotent

Conversation

@fzyzcjy
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy commented May 31, 2026

Guard the weight deletions in Qwen3ForCausalLM.set_embed_and_head with
hasattr checks so the method is idempotent: calling it when
embed_tokens.weight / lm_head.weight have already been deleted (e.g. a
second invocation, or after a prior tie/share step) no longer raises
AttributeError.


CI States

Latest PR Test (Base): ❌ Run #26700294436
Latest PR Test (Extra): ❌ Run #26700294381

Guard the weight deletions in `Qwen3ForCausalLM.set_embed_and_head` with
`hasattr` checks so the method is idempotent: calling it when
`embed_tokens.weight` / `lm_head.weight` have already been deleted (e.g. a
second invocation, or after a prior tie/share step) no longer raises
`AttributeError`.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the set_embed_and_head method in python/sglang/srt/models/qwen3.py to check for the existence of the weight attribute before deleting it. The reviewer suggested a more robust implementation to handle pipeline parallelism and tied word embeddings, recommending that the code check if the layers are instances of PPMissingLayer and avoid redundant operations when self.lm_head is the same object as self.model.embed_tokens to prevent unnecessary memory usage.

Comment on lines +677 to 682
if hasattr(self.model.embed_tokens, "weight"):
del self.model.embed_tokens.weight
if hasattr(self.lm_head, "weight"):
del self.lm_head.weight
self.model.embed_tokens.weight = embed
self.lm_head.weight = head
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When pipeline parallelism is enabled (pp_size > 1), ranks other than the first rank will have self.model.embed_tokens as a PPMissingLayer, and ranks other than the last rank will have self.lm_head as a PPMissingLayer.

Additionally, when tie_word_embeddings is enabled, self.lm_head is the exact same object as self.model.embed_tokens.

The current implementation:

  1. Unnecessarily assigns weight to PPMissingLayer placeholders on ranks where they are not used, which can prevent the large embed or head tensors from being garbage collected on those ranks.
  2. Performs redundant deletion and setting operations when weights are tied.

We can make this method fully robust and memory-efficient by checking if the modules are instances of PPMissingLayer and ensuring we don't perform redundant operations when self.lm_head is self.model.embed_tokens.

Suggested change
if hasattr(self.model.embed_tokens, "weight"):
del self.model.embed_tokens.weight
if hasattr(self.lm_head, "weight"):
del self.lm_head.weight
self.model.embed_tokens.weight = embed
self.lm_head.weight = head
if not isinstance(self.model.embed_tokens, PPMissingLayer):
if hasattr(self.model.embed_tokens, "weight"):
del self.model.embed_tokens.weight
self.model.embed_tokens.weight = embed
if not isinstance(self.lm_head, PPMissingLayer) and self.lm_head is not self.model.embed_tokens:
if hasattr(self.lm_head, "weight"):
del self.lm_head.weight
self.lm_head.weight = head

@fzyzcjy fzyzcjy merged commit 13ca55a into main May 31, 2026
86 of 99 checks passed
@fzyzcjy fzyzcjy deleted the tom/pr_chain/tom/kv_canary_revert_reversed/make-qwen3-s-set-embed-and-head-idempotent branch May 31, 2026 01:47
@fzyzcjy
Copy link
Copy Markdown
Collaborator Author

fzyzcjy commented May 31, 2026

Reviewed for correctness — looks good to me. ✅

The change guards the two del statements in set_embed_and_head with hasattr checks, which correctly achieves the stated idempotency:

  • hasattr(module, "weight") is the right check here — nn.Module.__getattr__ raises AttributeError for an unregistered parameter, so hasattr returns False once the weight has been deleted, and the del is safely skipped.
  • Calling the method twice now yields the same result as calling it once (the re-assignment + empty_cache/synchronize are harmless to repeat).
  • It also makes the PP non-last-rank case (where lm_head is a PPMissingLayer without a weight) more robust instead of raising.
  • The assignment path is unchanged, so there's no behavior regression.

Minor (non-blocking, out of scope): the same set_embed_and_head exists in ~40 other models (llama.py, deepseek_v2.py, gemma4_mtp.py, ...) all still using bare del — they likely share the same latent issue and could get the same guard later.

Note: both PR Test CI badges are currently red; worth confirming the failures are unrelated to this trivial change before merging.

JEF1056 pushed a commit to JEF1056/sglang-turboquant that referenced this pull request Jun 1, 2026
JEF1056 pushed a commit to JEF1056/sglang-turboquant that referenced this pull request Jun 1, 2026
xjpang pushed a commit to xjpang/sglang that referenced this pull request Jun 2, 2026
mqhc2020 pushed a commit to mqhc2020/sglang that referenced this pull request Jun 2, 2026
alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026
jeynmann pushed a commit to jeynmann/sglang that referenced this pull request Jun 4, 2026
edwingao28 pushed a commit to edwingao28/sglang that referenced this pull request Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant