Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
2e5b079
Gemma4 init.
pyc96 Mar 4, 2026
fe25241
format and cleanup
pyc96 Mar 4, 2026
cff018c
temp fix for kv sharing
pyc96 Mar 5, 2026
5939270
cleanup & tp
pyc96 Mar 6, 2026
dea02a2
Reasoning parser.
pyc96 Mar 7, 2026
418ba40
tool call parser
pyc96 Mar 8, 2026
3289b26
mm init
pyc96 Mar 9, 2026
67b7b29
config conversion global_head_dim <-> swa_head_dim
pyc96 Mar 9, 2026
9a87e88
more mm
pyc96 Mar 9, 2026
416eccb
re-add gemma4 rope. (was removed as part of rebase)
pyc96 Mar 10, 2026
1614769
lint on main
kpham-sgl Mar 18, 2026
2af1b41
gemma4 mm init and kvcache fix
kpham-sgl Mar 9, 2026
89bf65c
add vision tower support, pending some refactor
kpham-sgl Mar 9, 2026
c9959ab
don't partite the embedding projection because vision tower already d…
kpham-sgl Mar 10, 2026
18115f9
so many changes to make vision encoder work
kpham-sgl Mar 10, 2026
898b52d
clean up
kpham-sgl Mar 10, 2026
d6652f7
add more comments
kpham-sgl Mar 10, 2026
8b4c06f
init audio support
kpham-sgl Mar 10, 2026
f85490e
TP fix for audio encoder, change act_fn for vision_encoder, and updat…
kpham-sgl Mar 10, 2026
5b77767
audio, vision, and text all work correctly now
kpham-sgl Mar 11, 2026
785b99e
fix swa memory pool indices to retrieve
kpham-sgl Mar 11, 2026
27687c9
softmax_scale should not be kwargs
kpham-sgl Mar 11, 2026
278d9fc
fix misc bugs with SWA kv cache
kpham-sgl Mar 15, 2026
8b2dbe4
addressing comments
kpham-sgl Mar 17, 2026
95950dc
lint
kpham-sgl Mar 17, 2026
f6b9759
nit
kpham-sgl Mar 17, 2026
879aaef
Fix layer_scalar to apply unconditionally on all decoder layers
kpham-sgl Mar 17, 2026
e1f8f61
Clarify SWA attn_logits buffer condition in triton backend
kpham-sgl Mar 17, 2026
52d7fe8
Merge pull request #3 from pyc96/kp/gemma4-audio
JustinTong0323 Mar 18, 2026
702a55e
initial dense 31b support
kpham-sgl Mar 16, 2026
808860e
custom bidirectional mask for image tokens
kpham-sgl Mar 18, 2026
b8a8323
canonical warning for chunked prefill + bidirectional mask
kpham-sgl Mar 19, 2026
1d5117d
gemma4 moe
pyc96 Mar 17, 2026
3bc956e
bench_hf + fixes for moe
pyc96 Mar 17, 2026
0ee8f82
format
pyc96 Mar 17, 2026
897fc8b
address comments
pyc96 Mar 18, 2026
05057d3
clean up + add chat template for sgl front lang
pyc96 Mar 19, 2026
66520ef
fix: gemma4 layer_scalar, num_experts guard, and RMSNorm 2D reshape
JustinTong0323 Mar 19, 2026
977226f
lint
JustinTong0323 Mar 19, 2026
959c42b
nit: modify weight loader warning msg
kpham-sgl Mar 20, 2026
eebbbd1
Merge pull request #5 from pyc96/kp/gemma4-dense
kpham-sgl Mar 20, 2026
f16b722
Merge pull request #8 from pyc96/kp/gemma4-audio
kpham-sgl Mar 20, 2026
eb5e40f
Merge pull request #7 from pyc96/kp/gemma-moe
kpham-sgl Mar 20, 2026
5cb71e5
Merge pull request #9 from pyc96/kp/weight-loader-warning
pyc96 Mar 21, 2026
ee43c61
torch compile and tuning.
pyc96 Mar 22, 2026
b74a596
clean up grpc gen code
pyc96 Mar 23, 2026
5cf1b4a
Merge pull request #14 from pyc96/cleanup
pyc96 Mar 23, 2026
57fa43f
nit: modify weight loader warning msg
kpham-sgl Mar 20, 2026
f50a9fd
perf: accelerate Gemma4RMSNorm with sgl_kernel CUDA kernels
JustinTong0323 Mar 22, 2026
dd40b9d
perf: add fused triton kernel for gemma4 norm+residual+scalar
JustinTong0323 Mar 22, 2026
4bafc58
perf: fuse residual add and layer_scalar in Gemma4 decoder layer
JustinTong0323 Mar 22, 2026
31fc2e7
attempt to fix rms norm accuracy
pyc96 Mar 24, 2026
b616a9e
fix: Gemma4RMSNorm use rmsnorm(ones) for with_scale=False
JustinTong0323 Mar 24, 2026
b9d8667
perf: simplify MoE routing, fuse router mul
JustinTong0323 Mar 24, 2026
7c9a265
Merge pull request #10 from pyc96/gemma-torch-compile
JustinTong0323 Mar 24, 2026
a2ea29c
cleanup
pyc96 Mar 24, 2026
38332fb
lint
pyc96 Mar 24, 2026
b88b0fc
Merge pull request #15 from pyc96/lints
pyc96 Mar 24, 2026
ed17bf9
Merge branch 'main' into gemma4
JustinTong0323 Mar 24, 2026
e5ee7aa
cleanup: deduplicate Gemma4 hybrid layer config and tidy comments
JustinTong0323 Mar 24, 2026
b5048e7
remove comments
kpham-sgl Mar 24, 2026
1868639
Tuning fused moe for b200 TP2
pyc96 Mar 24, 2026
a5e08a3
Tuning moe on h100.
pyc96 Mar 24, 2026
2a01060
Merge pull request #18 from pyc96/tune
kpham-sgl Mar 24, 2026
88dd461
Merge pull request #17 from pyc96/tune-b200
kpham-sgl Mar 24, 2026
a43ef6a
misc image processor change
kpham-sgl Mar 26, 2026
bbac4bf
minor fix
kpham-sgl Mar 26, 2026
2432825
qkv weight name change for audio/vision tower
kpham-sgl Mar 26, 2026
456e98a
lint
kpham-sgl Mar 26, 2026
7ed4e44
nit
kpham-sgl Mar 26, 2026
33cae65
Merge pull request #20 from pyc96/kp/gemma4-update
kpham-sgl Mar 26, 2026
c646d08
new moe weight remapping
kpham-sgl Mar 26, 2026
db7c50d
nit
kpham-sgl Mar 26, 2026
a6a40d6
Merge pull request #21 from pyc96/kp/gemma4-moe-update
pyc96 Mar 26, 2026
0fd8fb2
fix
kpham-sgl Mar 27, 2026
9555512
better error msg
kpham-sgl Mar 27, 2026
80cffa1
lint
kpham-sgl Mar 27, 2026
771eab6
Merge pull request #22 from pyc96/kp/fix-for-new-transformer-wheel
JustinTong0323 Mar 27, 2026
363e7e7
fix: add post-pooling standardization to Gemma4 vision encoder
JustinTong0323 Mar 28, 2026
de4e50a
Merge pull request #23 from pyc96/xinyuan/fix-gemma4-vision-standardize
kpham-sgl Mar 28, 2026
d15c2fa
gemma 4 norm remove +1 shift
kpham-sgl Mar 30, 2026
a3d86be
lint
kpham-sgl Mar 30, 2026
aa6722d
init video processor
kpham-sgl Mar 30, 2026
ca68e57
Merge pull request #24 from pyc96/kp/gemma4-norm-update
kpham-sgl Mar 30, 2026
66c01ec
fix
kpham-sgl Mar 30, 2026
f2e15b0
more vision changes
kpham-sgl Mar 30, 2026
a7b20d7
misc bug fix for video pipeline
kpham-sgl Mar 30, 2026
cbe0a58
remove output_length
kpham-sgl Mar 30, 2026
f533a29
bidirectional attention only applies to image not video
kpham-sgl Mar 30, 2026
1bac1df
change ple pad id
kpham-sgl Mar 30, 2026
5928471
fix vision token id, add VDW to tensor step
kpham-sgl Mar 30, 2026
d1318ca
change how rope theta parameter is read
kpham-sgl Mar 31, 2026
b9e3aca
lint
kpham-sgl Mar 31, 2026
eaf827a
misc comment fix
kpham-sgl Mar 31, 2026
7bf599b
config.json is fixed
kpham-sgl Mar 31, 2026
96c54e1
Merge pull request #25 from pyc96/kp/gemma4-video-processor
pyc96 Mar 31, 2026
4e1385a
[ROCm] Fix Gemma4 MoE: AITER CK fallback + Gemma4RMSNorm forward_hip
andyluo7 Mar 31, 2026
8c69336
init config and param rename
kpham-sgl Mar 31, 2026
96b5c15
more config rename and per key dim scale fix
kpham-sgl Mar 31, 2026
248a945
update vlm test
kpham-sgl Mar 31, 2026
4f4a305
lint
kpham-sgl Mar 31, 2026
11ca0dc
nit constant
kpham-sgl Mar 31, 2026
e393057
[ROCm] Fix vision attention backend selection for Gemma4
andyluo7 Mar 31, 2026
38e8db6
Merge pull request #28 from pyc96/kp/gemma4-audio-new-config-fix
kpham-sgl Mar 31, 2026
89cf215
fix vlm accuracy test for vision
kpham-sgl Mar 31, 2026
7ab05a9
lint
kpham-sgl Mar 31, 2026
f1e1618
Merge pull request #29 from pyc96/kp/gemma4-fix-vlm-test-accuracy-images
kpham-sgl Mar 31, 2026
eaf75b2
Address review: narrow try/except to fused_moe() call only
andyluo7 Apr 1, 2026
37fe602
nit: use module-level logger instead of inline import in except block
JustinTong0323 Apr 1, 2026
2e1a49d
Merge pull request #27 from pyc96/rocm-gemma4-moe-fallback
JustinTong0323 Apr 1, 2026
263052e
Merge branch 'main' into gemma4
JustinTong0323 Apr 1, 2026
5fb0dc8
Merge branch 'gemma4' of github.com:pyc96/sglang-private into gemma4
JustinTong0323 Apr 1, 2026
5bff462
fix: Gemma4 fused kernel correctness, detector robustness, and dead c…
JustinTong0323 Apr 1, 2026
f0cc8b2
default Gemma4 attention backend to triton
JustinTong0323 Apr 1, 2026
cf1ee55
revert: fused RMSNorm kernel does not need +1 shift
JustinTong0323 Apr 1, 2026
776cd8f
MoE weight and config name change
kpham-sgl Apr 1, 2026
683827c
register gemma4 as hybrid SWA model
kpham-sgl Apr 1, 2026
aba771b
lint
kpham-sgl Apr 1, 2026
8b570e4
Merge pull request #32 from pyc96/kp/final-gemma4-update
kpham-sgl Apr 2, 2026
4e061dc
init remove swa kv pool hack
kpham-sgl Apr 2, 2026
1933840
Merge pull request #33 from pyc96/kp/swa-kv-pool-hack-remove
kpham-sgl Apr 2, 2026
2609b87
Merge branch 'main' into gemma4
JustinTong0323 Apr 2, 2026
595b768
Update python/sglang/srt/configs/model_config.py
JustinTong0323 Apr 2, 2026
2deb3de
Add 'ather' to codespell ignore list for chunked get_weather test str…
JustinTong0323 Apr 2, 2026
ad7d0d2
Update python/sglang/bench_one_batch.py
JustinTong0323 Apr 2, 2026
9b56b56
Address PR review comments: default layer_types, precompute causal ma…
JustinTong0323 Apr 2, 2026
862be17
Revert test/manual/test_vlm_accuracy.py to main (test model not ready)
JustinTong0323 Apr 2, 2026
d88f7e8
fix reasoning parser
adarshxs Apr 2, 2026
f09b2fa
add HIP back to optimized store_cache gate
adarshxs Apr 2, 2026
39b37ca
fix: use only enable_thinking for gemma4 reasoning parser to match ch…
JustinTong0323 Apr 2, 2026
4ca6816
Merge remote-tracking branch 'origin/main' into new-model-gg
kpham-sgl Apr 4, 2026
0df626a
adapt to MultimodalProcessorOutput
kpham-sgl Apr 4, 2026
d9d34c6
address comments
kpham-sgl Apr 4, 2026
71105ab
bring Gemma 4 function call parser test to unit/function_call/
kpham-sgl Apr 4, 2026
8fac51b
restore Qwen25 detector
kpham-sgl Apr 4, 2026
a6cef69
single line removal
kpham-sgl Apr 4, 2026
71b907f
hardening gemma 4 tool call and reasoning parser tests
kpham-sgl Apr 4, 2026
54ac982
nit
kpham-sgl Apr 4, 2026
0721bd6
Merge branch 'main' into new-model-gg
JustinTong0323 Apr 5, 2026
db10f3c
Merge branch 'main' into new-model-gg
JustinTong0323 Apr 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .codespellrc
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[codespell]
ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS
ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS, ather
skip = *.json,*.jsonl,*.patch,*.txt
4 changes: 4 additions & 0 deletions benchmark/kernels/fused_moe_triton/common_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ def get_model_config(
topk = config.num_experts_per_tok
intermediate_size = config.moe_intermediate_size
hidden_size = getattr(config, "moe_latent_size", None) or hidden_size
elif architecture == "Gemma4ForConditionalGeneration":
E = config.num_experts // ep_size
topk = config.top_k_experts
intermediate_size = config.moe_intermediate_size
else:
# Default: Mixtral
E = config.num_local_experts // ep_size
Expand Down
151 changes: 151 additions & 0 deletions benchmark/mmlu/bench_hf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
"""
Usage:
python3 bench_hf.py --model-path meta-llama/Llama-2-7b-hf --data-dir data --ntrain 5
"""

import argparse
import json
import os
import time

import numpy as np
import pandas as pd
import torch
from tqdm import tqdm
from transformers import AutoModelForCausalLM, AutoTokenizer

choices = ["A", "B", "C", "D"]


def format_subject(subject):
l = subject.split("_")
s = ""
for entry in l:
s += " " + entry
return s


def format_example(df, idx, include_answer=True):
prompt = df.iloc[idx, 0]
k = df.shape[1] - 2
for j in range(k):
prompt += "\n{}. {}".format(choices[j], df.iloc[idx, j + 1])
prompt += "\nAnswer:"
if include_answer:
prompt += " {}\n\n".format(df.iloc[idx, k + 1])
return prompt


def gen_prompt(train_df, subject, k=-1):
prompt = "The following are multiple choice questions (with answers) about{}.\n\n".format(
format_subject(subject)
)
if k == -1:
k = train_df.shape[0]
for i in range(k):
prompt += format_example(train_df, i)
return prompt


@torch.no_grad()
def main(args):
print(f"Loading model: {args.model_path}")
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
args.model_path,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
).eval()

subjects = sorted(
[
f.split("_test.csv")[0]
for f in os.listdir(os.path.join(args.data_dir, "test"))
if "_test.csv" in f
]
)

all_cors = []
num_requests = 0
total_latency = 0

for subject in tqdm(subjects[: args.nsub]):
dev_df = pd.read_csv(
os.path.join(args.data_dir, "dev", subject + "_dev.csv"), header=None
)[: args.ntrain]
test_df = pd.read_csv(
os.path.join(args.data_dir, "test", subject + "_test.csv"), header=None
)

k = args.ntrain
few_shot_examples = gen_prompt(dev_df, subject, k)
while len(tokenizer.encode(few_shot_examples)) > 1536:
k -= 1
if k < 0:
break
few_shot_examples = gen_prompt(dev_df, subject, k)

preds = []
labels = []
tic = time.perf_counter()

for i in range(test_df.shape[0]):
prompt_end = format_example(test_df, i, include_answer=False)
prompt = few_shot_examples + prompt_end

input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(
input_ids,
max_new_tokens=1,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)

output_str = tokenizer.decode(
output_ids[0][input_ids.shape[-1] :], skip_special_tokens=True
)
preds.append(output_str.strip()[0] if len(output_str.strip()) > 0 else "")
labels.append(test_df.iloc[i, test_df.shape[1] - 1])

latency = time.perf_counter() - tic
total_latency += latency

cors = [pred == label for pred, label in zip(preds, labels)]
all_cors.append(cors)
num_requests += len(test_df)

print(
f"Subject: {subject}, Accuracy: {np.mean(cors):.3f}, Latency: {latency:.3f}s"
)

weighted_acc = np.mean(np.concatenate(all_cors))
print(f"Total Latency: {total_latency:.3f}s")
print(f"Average Accuracy: {weighted_acc:.3f}")

if args.output:
with open(args.output, "a") as fout:
value = {
"task": "mmlu",
"backend": "hf",
"model": args.model_path,
"latency": round(total_latency, 3),
"accuracy": round(weighted_acc, 3),
"num_requests": num_requests,
"other": {
"nsub": args.nsub,
"ntrain": args.ntrain,
},
}
fout.write(json.dumps(value) + "\n")


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model-path", type=str, required=True)
parser.add_argument("--ntrain", type=int, default=5)
parser.add_argument("--data-dir", type=str, default="data")
parser.add_argument("--nsub", type=int, default=60)
parser.add_argument("--output", type=str, help="Output file path")
args = parser.parse_args()
main(args)
25 changes: 17 additions & 8 deletions python/sglang/lang/chat_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,19 @@ def get_chat_template_by_model_path(model_path):
)
)

register_chat_template(
ChatTemplate(
name="gemma-4-it",
default_system_prompt=None,
role_prefix_and_suffix={
"system": ("", ""),
"user": ("<|turn>user\n", "<turn|>\n"),
"assistant": ("<|turn>assistant\n", "<turn|>\n"),
},
style=ChatTemplateStyle.PLAIN,
)
)

register_chat_template(
ChatTemplate(
name="dbrx-instruct",
Expand Down Expand Up @@ -611,8 +624,10 @@ def match_chat_yi(model_path: str):


@register_chat_template_matching_function
def match_gemma_it(model_path: str):
if re.search(r"gemma.*it", model_path, re.IGNORECASE):
def match_gemma(model_path: str):
if re.search(r"gemma-4.*it", model_path, re.IGNORECASE):
return "gemma-4-it"
if re.search(r"(gemma.*it)|(gemma-3)", model_path, re.IGNORECASE):
return "gemma-it"


Expand All @@ -636,12 +651,6 @@ def match_granite_instruct(model_path: str):
return "granite-3-instruct"


@register_chat_template_matching_function
def match_gemma3_instruct(model_path: str):
if re.search(r"gemma-3", model_path, re.IGNORECASE):
return "gemma-it"


@register_chat_template_matching_function
def match_internvl_chat(model_path: str):
if re.search(r"internvl2_5", model_path, re.IGNORECASE):
Expand Down
20 changes: 18 additions & 2 deletions python/sglang/srt/configs/model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,8 @@ def _derive_hybrid_model(self):
self.is_hybrid_swa_compress = self.hf_config.architectures[0] in [
"MiMoV2FlashForCausalLM",
"MiMoV2MTP",
"Gemma4ForCausalLM",
"Gemma4ForConditionalGeneration",
]

def _derive_context_length(self, context_length: int):
Expand Down Expand Up @@ -433,7 +435,7 @@ def _derive_model_shapes(self):
self.swa_v_head_dim = getattr(
self.hf_text_config,
"swa_v_head_dim",
self.v_head_dim,
self.swa_head_dim,
)
# FIXME: temporary special judge for MLA architecture
if (
Expand Down Expand Up @@ -1301,6 +1303,7 @@ def is_generation_model(model_architectures: List[str], is_embedding: bool = Fal
"Ernie4_5_VLMoeForConditionalGeneration",
"Gemma3ForConditionalGeneration",
"Gemma3nForConditionalGeneration",
"Gemma4ForConditionalGeneration",
"Glm4vForConditionalGeneration",
"Glm4vMoeForConditionalGeneration",
"GlmOcrForConditionalGeneration",
Expand Down Expand Up @@ -1447,6 +1450,8 @@ def is_hybrid_swa_model(model_architectures: List[str]):
"MiMoV2MTP",
"Step3p5ForCausalLM",
"Step3p5MTP",
"Gemma4ForCausalLM",
"Gemma4ForConditionalGeneration",
}
return any(arch in hybrid_swa_archs for arch in model_architectures)

Expand All @@ -1464,7 +1469,7 @@ def get_hybrid_layer_ids(
i for i in range(num_hidden_layers) if (i + 1) % 4 == 0
]
elif "GptOssForCausalLM" in model_architectures:
layer_types = getattr(hf_text_config, "layer_types", None)
layer_types = getattr(hf_text_config, "layer_types", [])
swa_attention_layer_ids = [
i for i, x in enumerate(layer_types) if x == "sliding_attention"
]
Expand Down Expand Up @@ -1497,6 +1502,17 @@ def get_hybrid_layer_ids(
elif "Step3p5MTP" in model_architectures:
swa_attention_layer_ids = [0]
full_attention_layer_ids = []
elif (
"Gemma4ForCausalLM" in model_architectures
or "Gemma4ForConditionalGeneration" in model_architectures
):
layer_types = getattr(hf_text_config, "layer_types", [])
swa_attention_layer_ids = [
i for i, x in enumerate(layer_types) if x == "sliding_attention"
]
full_attention_layer_ids = [
i for i, x in enumerate(layer_types) if x == "full_attention"
]
else:
swa_attention_layer_ids = None
full_attention_layer_ids = None
Expand Down
15 changes: 13 additions & 2 deletions python/sglang/srt/entrypoints/openai/serving_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,11 @@ def __init__(
and hasattr(self.tokenizer_manager.model_config.hf_config, "model_type")
and self.tokenizer_manager.model_config.hf_config.model_type == "gpt_oss"
)
self.is_gemma4 = (
hasattr(self.tokenizer_manager.model_config, "hf_config")
and hasattr(self.tokenizer_manager.model_config.hf_config, "model_type")
and self.tokenizer_manager.model_config.hf_config.model_type == "gemma4"
)

self.use_dpsk_v32_encoding = self._use_dpsk_v32_encoding()

Expand Down Expand Up @@ -331,7 +336,7 @@ def _process_messages(
) -> MessageProcessingResult:
"""Process chat messages and apply chat template"""
# GptOss model needs to keep special tokens for harmony parsing
if self.is_gpt_oss:
if self.is_gpt_oss or self.is_gemma4:
request.skip_special_tokens = False

self._patch_mistral_skip_special_tokens(request)
Expand Down Expand Up @@ -1280,12 +1285,18 @@ def _get_reasoning_from_request(self, request: ChatCompletionRequest) -> bool:
"""
if not self.reasoning_parser:
return False
if self.reasoning_parser in ["deepseek-v3"]:

if self.reasoning_parser == "deepseek-v3":
# Models that require explicit enable thinking (thinking=True)
return (
request.chat_template_kwargs is not None
and request.chat_template_kwargs.get("thinking") is True
)
if self.reasoning_parser == "gemma4":
return (
request.chat_template_kwargs is not None
and request.chat_template_kwargs.get("enable_thinking") is True
)
if self.reasoning_parser in ["kimi_k2"]:
# Models that thinking by default, and can be disabled by setting thinking=False
return (
Expand Down
2 changes: 2 additions & 0 deletions python/sglang/srt/function_call/function_call_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from sglang.srt.function_call.deepseekv3_detector import DeepSeekV3Detector
from sglang.srt.function_call.deepseekv31_detector import DeepSeekV31Detector
from sglang.srt.function_call.deepseekv32_detector import DeepSeekV32Detector
from sglang.srt.function_call.gemma4_detector import Gemma4Detector
from sglang.srt.function_call.gigachat3_detector import GigaChat3Detector
from sglang.srt.function_call.glm4_moe_detector import Glm4MoeDetector
from sglang.srt.function_call.glm47_moe_detector import Glm47MoeDetector
Expand Down Expand Up @@ -69,6 +70,7 @@ class FunctionCallParser:
"interns1": InternlmDetector,
"hermes": HermesDetector,
"gigachat3": GigaChat3Detector,
"gemma4": Gemma4Detector,
}

def __init__(self, tools: List[Tool], tool_call_parser: str):
Expand Down
Loading
Loading