Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
bde4137
init new spec algo frozen_kv_mtp
kpham-sgl Apr 29, 2026
355d75e
refactoring
kpham-sgl Apr 29, 2026
df95d5a
test and config fix
kpham-sgl Apr 29, 2026
70af880
model code helper
kpham-sgl Apr 29, 2026
92e7371
buggy eager code
kpham-sgl Apr 29, 2026
662ab8b
untested topk + cudagraph
kpham-sgl Apr 29, 2026
2ff1185
embedding scale fix
kpham-sgl Apr 29, 2026
f728276
working-ish
kpham-sgl Apr 29, 2026
24ea9da
multi-batch support
kpham-sgl Apr 29, 2026
115e360
fixme
kpham-sgl Apr 29, 2026
fec1f41
refactor to server args
kpham-sgl Apr 29, 2026
cba0374
refactor model defn
kpham-sgl Apr 29, 2026
80f0fa8
rename for ckpt update and new wheels
kpham-sgl Apr 29, 2026
9e08f51
adapt to recent spec dec refactors
kpham-sgl Apr 29, 2026
290dda4
buggy topk impl
kpham-sgl Apr 29, 2026
cee5feb
working topk > 1 eager
kpham-sgl May 1, 2026
776618e
topk > 1 + cudagraph support complete
kpham-sgl May 1, 2026
e9a0178
nit
kpham-sgl May 1, 2026
cadab64
init refactor
kpham-sgl May 3, 2026
26b17d9
more refactor
kpham-sgl May 3, 2026
4fa0036
contiguous centroid indexing
pyc96 May 4, 2026
410e7f7
tests perf diff
pyc96 May 5, 2026
36566bb
remove the test
pyc96 May 5, 2026
19cd23d
default model weight is not in centroid order
pyc96 May 5, 2026
2dd8611
format
pyc96 May 5, 2026
138e6d0
upd
kpham-sgl May 5, 2026
9f64d51
address comments
kpham-sgl May 5, 2026
bb65e77
Gemma4 MTP doesn't need to reserve memory for KV (#24451)
pyc96 May 5, 2026
815305b
add E4B to CI test
kpham-sgl May 6, 2026
197d307
put nightly to later
kpham-sgl May 6, 2026
be7611f
Merge branch 'main' into gemma4-mtp-fin
kpham-sgl May 6, 2026
998dc70
remove CI test until transformer==5.8.0
kpham-sgl May 6, 2026
f374eca
Merge branch 'gemma4-mtp-fin' of github.com:sgl-project/sglang into g…
kpham-sgl May 6, 2026
bcf8d10
Merge branch 'main' into gemma4-mtp-fin
kpham-sgl May 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions python/sglang/srt/models/gemma4_causal.py
Original file line number Diff line number Diff line change
Expand Up @@ -878,6 +878,9 @@ def __init__(
def get_input_embeddings(self) -> nn.Embedding:
return self.model.embed_tokens

def get_embed_and_head(self) -> Tuple[torch.Tensor, torch.Tensor]:
return self.model.embed_tokens.weight, self.lm_head.weight

def get_attention_sliding_window_size(self):
return get_attention_sliding_window_size(self.config)

Expand Down
5 changes: 5 additions & 0 deletions python/sglang/srt/models/gemma4_mm.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,11 @@ def pad_input_ids(
def get_input_embeddings(self) -> nn.Embedding:
return self.language_model.get_input_embeddings()

def get_embed_and_head(self) -> Tuple[torch.Tensor, torch.Tensor]:
# Gemma 4 multimodal ties its LM head to the text embed_tokens
embed = self.language_model.embed_tokens.weight
return embed, embed

def get_attention_sliding_window_size(self):
return getattr(self.config.text_config, "sliding_window", -1) - 1

Expand Down
Loading
Loading