[VLM] Support cos sin cache for Ernie4.5-VL by yuan-luo · Pull Request #19743 · sgl-project/sglang

yuan-luo · 2026-03-03T04:27:03Z

Motivation

This PR refactors the rotary positional embedding implementation to expose an explicit cos/sin cache interface and reuse it in the 2D vision RoPE code path. Instead of recomputing frequencies and calling cos() / sin() repeatedly, we precompute and cache the 1D cos/sin values once, then index into this cache for both text RoPE and the 2D grid RoPE used by the vision encoder.
The framework has already been implemented in #15205 , this PR is to enforce this enhancement into more VLMs.

Performance improved TTFT 3%.

Modifications

Accuracy Tests

lmms_eval no drops:

Main:

root@c7e9bb6a6789:/sgl-workspace/sglang# python3 -m lmms_eval --model openai_compatible   --model_args model_version=baidu/ERNIE-4.5-VL-28B-A3B  --tasks mmmu_val   --batch_size 16

| Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|Stderr_CLT|Stderr_Clustered|
|--------|------:|------|-----:|--------|---|-----:|---|------|----------|----------------|
|mmmu_val|      0|none  |     0|mmmu_acc|↑  |0.2644|±  |N/A   |N/A       |N/A             |

PR:

➜  bench_script python3 -m lmms_eval --model openai_compatible   --model_args model_version=baidu/ERNIE-4.5-VL-28B-A3B   --tasks mmmu_val   --batch_size 16
2026-03-03 04:14:29 | INFO     | __main__:cli_evaluate:476 - Verbosity set to INFO
2026-03-03 04:14:32 | INFO     | __main__:cli_evaluate_single:565 - Evaluation tracker args: {}
2026-03-03 04:14:32 | INFO     | __main__:cli_evaluate_single:649 - Selected Tasks: ['mmmu_val']
2026-03-03 04:14:32 | INFO     | lmms_eval.evaluator:simple_evaluate:170 - Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2026-03-03 04:14:33 | INFO     | lmms_eval.evaluator:evaluate:515 - Running on rank 0 (local rank 0)
2026-03-03 04:14:33 | INFO     | lmms_eval.api.task:build_all_requests:428 - Building contexts for mmmu_val on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 900/900 [00:00<00:00, 13835.13it/s]
2026-03-03 04:14:33 | INFO     | lmms_eval.evaluator:evaluate:609 - Running generate_until requests
Model Responding: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 897/900 [07:59<00:01,  1.89it/s]2026-03-03 04:22:33 | INFO     | lmms_eval.models.model_utils.gen_metrics:log_metrics:136 - Metric summary - Total elapsed time: 14580.099s, Total gen tokens: 112510, Avg speed: 7.7 tokens/s
Model Responding: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 900/900 [07:59<00:00,  1.88it/s]
Postprocessing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 900/900 [00:00<00:00, 8850.28it/s]
{'Overall-Art and Design': {'num': 120, 'acc': 0.33333}, 'Art': {'num': 30, 'acc': 0.23333}, 'Art_Theory': {'num': 30, 'acc': 0.23333}, 'Design': {'num': 30, 'acc': 0.5}, 'Music': {'num': 30, 'acc': 0.36667}, 'Overall-Business': {'num': 150, 'acc': 0.24}, 'Accounting': {'num': 30, 'acc': 0.23333}, 'Economics': {'num': 30, 'acc': 0.3}, 'Finance': {'num': 30, 'acc': 0.13333}, 'Manage': {'num': 30, 'acc': 0.2}, 'Marketing': {'num': 30, 'acc': 0.33333}, 'Overall-Science': {'num': 150, 'acc': 0.20667}, 'Biology': {'num': 30, 'acc': 0.2}, 'Chemistry': {'num': 30, 'acc': 0.1}, 'Geography': {'num': 30, 'acc': 0.3}, 'Math': {'num': 30, 'acc': 0.2}, 'Physics': {'num': 30, 'acc': 0.23333}, 'Overall-Health and Medicine': {'num': 150, 'acc': 0.27333}, 'Basic_Medical_Science': {'num': 30, 'acc': 0.26667}, 'Clinical_Medicine': {'num': 30, 'acc': 0.23333}, 'Diagnostics_and_Laboratory_Medicine': {'num': 30, 'acc': 0.3}, 'Pharmacy': {'num': 30, 'acc': 0.3}, 'Public_Health': {'num': 30, 'acc': 0.26667}, 'Overall-Humanities and Social Science': {'num': 120, 'acc': 0.26667}, 'History': {'num': 30, 'acc': 0.4}, 'Literature': {'num': 30, 'acc': 0.36667}, 'Sociology': {'num': 30, 'acc': 0.13333}, 'Psychology': {'num': 30, 'acc': 0.16667}, 'Overall-Tech and Engineering': {'num': 210, 'acc': 0.28571}, 'Agriculture': {'num': 30, 'acc': 0.26667}, 'Architecture_and_Engineering': {'num': 30, 'acc': 0.46667}, 'Computer_Science': {'num': 30, 'acc': 0.23333}, 'Electronics': {'num': 30, 'acc': 0.03333}, 'Energy_and_Power': {'num': 30, 'acc': 0.36667}, 'Materials': {'num': 30, 'acc': 0.23333}, 'Mechanical_Engineering': {'num': 30, 'acc': 0.4}, 'Overall': {'num': 900, 'acc': 0.26667}}
fatal: not a git repository (or any of the parent directories): .git
2026-03-03 04:22:33 | INFO     | lmms_eval.loggers.evaluation_tracker:save_results_aggregated:238 - Output path not provided, skipping saving results aggregated
openai_compatible (model_version=baidu/ERNIE-4.5-VL-28B-A3B), gen_kwargs: (), limit: None, offset: 0, num_fewshot: None, batch_size: 16
| Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|------|
|mmmu_val|      0|none  |     0|mmmu_acc|↑  |0.2667|±  |N/A   |

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-03T04:27:07Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yuan-luo · 2026-03-03T04:29:17Z

/tag-and-rerun-ci

yuan-luo · 2026-03-03T11:57:34Z

/rerun-failed-ci

yuan-luo · 2026-03-03T15:21:27Z

/rerun-failed-ci

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

Support cos sin cache for Ernie4.5-VL

a361bb2

yuan-luo requested review from BBuf, JustinTong0323, mickqian and yhyang201 March 3, 2026 04:29

github-actions bot added the run-ci label Mar 3, 2026

BBuf approved these changes Mar 3, 2026

View reviewed changes

BBuf merged commit 82e7139 into sgl-project:main Mar 4, 2026
269 of 299 checks passed

yuan-luo deleted the support_cos_sin_cache branch March 4, 2026 03:10

Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026

[VLM] Support cos sin cache for Ernie4.5-VL (sgl-project#19743)

db100fa

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

qeternity pushed a commit to qeternity/sglang that referenced this pull request Mar 6, 2026

[VLM] Support cos sin cache for Ernie4.5-VL (sgl-project#19743)

2506e82

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

[VLM] Support cos sin cache for Ernie4.5-VL (sgl-project#19743)

4322fdb

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[VLM] Support cos sin cache for Ernie4.5-VL (sgl-project#19743)

436e3a7

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM] Support cos sin cache for Ernie4.5-VL#19743

[VLM] Support cos sin cache for Ernie4.5-VL#19743
BBuf merged 1 commit intosgl-project:mainfrom
antgroup:support_cos_sin_cache

yuan-luo commented Mar 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

yuan-luo commented Mar 3, 2026

Uh oh!

yuan-luo commented Mar 3, 2026

Uh oh!

yuan-luo commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuan-luo commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

yuan-luo commented Mar 3, 2026

Uh oh!

yuan-luo commented Mar 3, 2026

Uh oh!

yuan-luo commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuan-luo commented Mar 3, 2026 •

edited

Loading