Skip to content

fix load_weights for glm4v_moe with shared_experts fusion#14610

Merged
JustinTong0323 merged 1 commit intosgl-project:glm46vfrom
zminglei:glm46v-fix
Dec 8, 2025
Merged

fix load_weights for glm4v_moe with shared_experts fusion#14610
JustinTong0323 merged 1 commit intosgl-project:glm46vfrom
zminglei:glm46v-fix

Conversation

@zminglei
Copy link
Collaborator

@zminglei zminglei commented Dec 8, 2025

Motivation

fix load_weights for glm4v_moe with shared_experts fusion

Launch server:
python -m sglang.launch_server --model-path /shared/public/elr-models/zai-org/GLM-4.5V-FP8/ --tp-size 4
Before:
Accuracy is 0 and send_one gives garbage text output
After:

python benchmark/gsm8k/bench_sglang.py --data-path /shared/public/data/gsm8k/test.jsonl
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:16<00:00, 12.42it/s]
Accuracy: 0.930
Invalid: 0.000
Latency: 16.176 s
Output throughput: 1446.905 token/s

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@JustinTong0323
Copy link
Collaborator

Works for glm4.5v

Accuracy: 0.960
Invalid: 0.000
Latency: 11.552 s
Output throughput: 1922.723 token/s
metrics={'accuracy': np.float64(0.96), 'invalid': np.float64(0.0), 'latency': 11.551843108143657, 'output_throughput': 1922.7234816184439}

@JustinTong0323 JustinTong0323 merged commit 0224b17 into sgl-project:glm46v Dec 8, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants