Skip to content

[Feature] Optimizations for JPEG input on NVIDIA GPU#19749

Merged
yhyang201 merged 2 commits intosgl-project:mainfrom
wili-65535:wili/jpeg-preprocess
Mar 29, 2026
Merged

[Feature] Optimizations for JPEG input on NVIDIA GPU#19749
yhyang201 merged 2 commits intosgl-project:mainfrom
wili-65535:wili/jpeg-preprocess

Conversation

@wili-65535
Copy link
Copy Markdown
Contributor

@wili-65535 wili-65535 commented Mar 3, 2026

Motivation

Modifications

  • Use torch.ops.image.decode_jpegs_cuda, converting CPU bytes directly to torch GPU tensors using the nvJPEG hardware decoder.
  • This eliminates intermediate data formats such as PIL Images and CPU tensors, and minimizes CPU-GPU data transfers.

Accuracy Tests

  • lm_eval shows no drop between main branch and this PR, which get the similar score:

  • Command:

lm-eval run --model sglang --model_args pretrained=/workspace/Qwen3-VL-8B-Instruct,dtype=auto,tp_size=1 --tasks gpqa_diamond_zeroshot gpqa_extended_zeroshot gpqa_main_zeroshot gpqa_diamond_n_shot gpqa_extended_n_shot gpqa_main_n_shot
  • Before this optimization (main branch baseline)
Tasks Version Filter n-shot Metric Value Stderr
gpqa_diamond_n_shot 2 none 0 acc 0.3687 ± 0.0344
none 0 acc_norm 0.3687 ± 0.0344
gpqa_diamond_zeroshot 1 none 0 acc 0.3788 ± 0.0346
none 0 acc_norm 0.3788 ± 0.0346
gpqa_extended_n_shot 2 none 0 acc 0.3791 ± 0.0208
none 0 acc_norm 0.3791 ± 0.0208
gpqa_extended_zeroshot 1 none 0 acc 0.3773 ± 0.0208
none 0 acc_norm 0.3773 ± 0.0208
gpqa_main_n_shot 2 none 0 acc 0.3862 ± 0.0230
none 0 acc_norm 0.3862 ± 0.0230
gpqa_main_zeroshot 1 none 0 acc 0.4085 ± 0.0232
none 0 acc_norm 0.4085 ± 0.0232
  • After this optimization (this PR)
Tasks Version Filter n-shot Metric Value Stderr
gpqa_diamond_n_shot 2 none 0 acc 0.3687 ± 0.0344
none 0 acc_norm 0.3687 ± 0.0344
gpqa_diamond_zeroshot 1 none 0 acc 0.3788 ± 0.0346
none 0 acc_norm 0.3788 ± 0.0346
gpqa_extended_n_shot 2 none 0 acc 0.3790 ± 0.0185
none 0 acc_norm 0.3790 ± 0.0185
gpqa_extended_zeroshot 1 none 0 acc 0.3773 ± 0.0208
none 0 acc_norm 0.3773 ± 0.0208
gpqa_main_n_shot 2 none 0 acc 0.3862 ± 0.0230
none 0 acc_norm 0.3862 ± 0.0230
gpqa_main_zeroshot 1 none 0 acc 0.4085 ± 0.0232
none 0 acc_norm 0.4085 ± 0.0232
  • lmms_eval shows no drop between main branch and this PR, which get the similar score:

  • Command:

python3 -m lmms_eval --model sglang --model_args model_path=/workspace/Qwen3-VL-8B-Instruct --tasks mmmu_val
  • Before this optimization (main branch baseline)
{'Overall-Art and Design': {'num': 120, 'acc': 0.68333}, 'Art': {'num': 30, 'acc': 0.66667}, 'Art_Theory': {'num': 30, 'acc': 0.9}, 'Design': {'num': 30, 'acc': 0.73333}, 'Music': {'num': 30, 'acc': 0.43333}, 'Overall-Business': {'num': 150, 'acc': 0.40667}, 'Accounting': {'num': 30, 'acc': 0.3}, 'Economics': {'num': 30, 'acc': 0.5}, 'Finance': {'num': 30, 'acc': 0.26667}, 'Manage': {'num': 30, 'acc': 0.5}, 'Marketing': {'num': 30, 'acc': 0.46667}, 'Overall-Science': {'num': 150, 'acc': 0.45333}, 'Biology': {'num': 30, 'acc': 0.53333}, 'Chemistry': {'num': 30, 'acc': 0.36667}, 'Geography': {'num': 30, 'acc': 0.53333}, 'Math': {'num': 30, 'acc': 0.3}, 'Physics': {'num': 30, 'acc': 0.53333}, 'Overall-Health and Medicine': {'num': 150, 'acc': 0.53333}, 'Basic_Medical_Science': {'num': 30, 'acc': 0.66667}, 'Clinical_Medicine': {'num': 30, 'acc': 0.66667}, 'Diagnostics_and_Laboratory_Medicine': {'num': 30, 'acc': 0.36667}, 'Pharmacy': {'num': 30, 'acc': 0.5}, 'Public_Health': {'num': 30, 'acc': 0.46667}, 'Overall-Humanities and Social Science': {'num': 120, 'acc': 0.675}, 'History': {'num': 30, 'acc': 0.6}, 'Literature': {'num': 30, 'acc': 0.83333}, 'Sociology': {'num': 30, 'acc': 0.63333}, 'Psychology': {'num': 30, 'acc': 0.63333}, 'Overall-Tech and Engineering': {'num': 210, 'acc': 0.38095}, 'Agriculture': {'num': 30, 'acc': 0.53333}, 'Architecture_and_Engineering': {'num': 30, 'acc': 0.3}, 'Computer_Science': {'num': 30, 'acc': 0.5}, 'Electronics': {'num': 30, 'acc': 0.3}, 'Energy_and_Power': {'num': 30, 'acc': 0.3}, 'Materials': {'num': 30, 'acc': 0.4}, 'Mechanical_Engineering': {'num': 30, 'acc': 0.33333}, 'Overall': {'num': 900, 'acc': 0.50222}}
2026-03-16T05:00:15.655750+0000 | save_results_aggregated | INFO - Output path not provided, skipping saving results aggregated
sglang (model_path=/workspace/Qwen3-VL-8B-Instruct), gen_kwargs: (), limit: None, offset: 0, num_fewshot: None, batch_size: 1

LMMs-Eval: Probing Intelligence in the Real World
> The unified evaluation toolkit for frontier models.

branch: main
commit: v0.6-72-g88b23e2b

| Tasks  |Filter|n-shot| Metric |   |Value |   |Stderr|
|--------|------|-----:|--------|---|-----:|---|------|
|mmmu_val|none  |     0|mmmu_acc|↑  |0.5022|±  |N/A   |
  • After this optimization (this PR)
{'Overall-Art and Design': {'num': 120, 'acc': 0.68333}, 'Art': {'num': 30, 'acc': 0.66667}, 'Art_Theory': {'num': 30, 'acc': 0.9}, 'Design': {'num': 30, 'acc': 0.73333}, 'Music': {'num': 30, 'acc': 0.43333}, 'Overall-Business': {'num': 150, 'acc': 0.40667}, 'Accounting': {'num': 30, 'acc': 0.3}, 'Economics': {'num': 30, 'acc': 0.5}, 'Finance': {'num': 30, 'acc': 0.26667}, 'Manage': {'num': 30, 'acc': 0.5}, 'Marketing': {'num': 30, 'acc': 0.46667}, 'Overall-Science': {'num': 150, 'acc': 0.45333}, 'Biology': {'num': 30, 'acc': 0.53333}, 'Chemistry': {'num': 30, 'acc': 0.36667}, 'Geography': {'num': 30, 'acc': 0.53333}, 'Math': {'num': 30, 'acc': 0.3}, 'Physics': {'num': 30, 'acc': 0.53333}, 'Overall-Health and Medicine': {'num': 150, 'acc': 0.53333}, 'Basic_Medical_Science': {'num': 30, 'acc': 0.66667}, 'Clinical_Medicine': {'num': 30, 'acc': 0.66667}, 'Diagnostics_and_Laboratory_Medicine': {'num': 30, 'acc': 0.36667}, 'Pharmacy': {'num': 30, 'acc': 0.5}, 'Public_Health': {'num': 30, 'acc': 0.46667}, 'Overall-Humanities and Social Science': {'num': 120, 'acc': 0.675}, 'History': {'num': 30, 'acc': 0.6}, 'Literature': {'num': 30, 'acc': 0.83333}, 'Sociology': {'num': 30, 'acc': 0.63333}, 'Psychology': {'num': 30, 'acc': 0.63333}, 'Overall-Tech and Engineering': {'num': 210, 'acc': 0.38095}, 'Agriculture': {'num': 30, 'acc': 0.53333}, 'Architecture_and_Engineering': {'num': 30, 'acc': 0.3}, 'Computer_Science': {'num': 30, 'acc': 0.5}, 'Electronics': {'num': 30, 'acc': 0.3}, 'Energy_and_Power': {'num': 30, 'acc': 0.3}, 'Materials': {'num': 30, 'acc': 0.4}, 'Mechanical_Engineering': {'num': 30, 'acc': 0.33333}, 'Overall': {'num': 900, 'acc': 0.50222}}
2026-03-16T05:11:01.956747+0000 | save_results_aggregated | INFO - Output path not provided, skipping saving results aggregated
sglang (model_path=/workspace/Qwen3-VL-8B-Instruct), gen_kwargs: (), limit: None, offset: 0, num_fewshot: None, batch_size: 1

LMMs-Eval: Probing Intelligence in the Real World
> The unified evaluation toolkit for frontier models.

branch: main
commit: v0.6-72-g88b23e2b

| Tasks  |Filter|n-shot| Metric |   |Value |   |Stderr|
|--------|------|-----:|--------|---|-----:|---|------|
|mmmu_val|none  |     0|mmmu_acc|↑  |0.5022|±  |N/A   |

Benchmarking and Profiling

  • Part of the performance data is in the original issue, here we file more detailed data.
  • We use H100 GPU to run Qwen3-VL-8B model (actually it doesn't matter which model in use), sending requests with one JPEG image of different resolution, and focus on the log information like:
[2026-03-03 06:53:03] [QwenVLProcessor Perf] rid='44a05fff418f4b1cb448b345fa8ac336', load_time: 13.69 ms, preprocess_time: 0.00 ms, process_time: 307.90 ms, get_rope_index_time: 3.58 ms, total_time: 325.17 ms
  • The results before / after this PR are shown below. Averagely 1.5x acceleration is earned, and the larger the image is, the better performance is earned. In some extreme scenario (like the original issue shown), up to 3.8x acceleration might be earned.
size before after Speed Up
load_time/ms process_time/ms total_time/ms load_time/ms process_time/ms total_time/ms
32x32 0.58 1.18 2.16 0.62 1.18 2.15 1.00
64x64 0.34 1.55 2.16 0.57 1.86 2.91 0.74
96x96 0.52 2.32 3.12 0.57 1.25 2.09 1.49
128x128 0.37 2.40 3.05 0.56 1.25 2.08 1.47
160x160 0.52 1.97 2.76 0.60 1.24 2.12 1.30
192x192 0.35 1.66 2.27 0.63 1.29 2.20 1.03
224x224 0.37 1.94 2.57 0.69 1.28 2.26 1.14
256x256 0.48 2.02 2.77 0.68 1.24 2.19 1.26
288x288 0.43 1.96 2.64 0.74 1.12 2.14 1.23
320x320 0.45 2.22 2.93 0.77 1.19 2.26 1.30
352x352 0.57 2.32 3.15 0.82 1.25 2.34 1.35
384x384 0.52 2.65 3.43 0.86 1.33 2.47 1.39
416x416 0.54 2.84 3.63 0.88 1.43 2.58 1.41
448x448 0.56 3.09 3.91 0.97 1.57 2.84 1.38
480x480 0.66 3.60 4.51 1.11 1.60 2.99 1.51
512x512 0.63 3.87 4.77 1.10 1.90 3.29 1.45
544x544 0.68 4.52 5.48 1.10 1.83 3.20 1.71
576x576 0.68 5.05 6.00 1.30 2.29 3.90 1.54
608x608 0.73 5.16 6.16 1.25 2.18 3.73 1.65
640x640 0.76 5.34 6.42 1.70 2.68 4.67 1.37
672x672 0.79 5.92 6.98 1.56 2.27 4.10 1.70
704x704 0.83 6.72 7.84 1.83 2.49 4.60 1.70
736x736 0.82 7.00 8.09 1.67 2.79 4.81 1.68
768x768 0.99 7.27 8.54 2.05 2.94 5.30 1.61
800x800 0.98 8.22 9.49 2.16 3.01 5.46 1.74
832x832 1.13 8.31 9.72 2.13 3.43 5.88 1.65
864x864 1.09 9.58 10.95 2.57 3.25 6.11 1.79
896x896 1.24 9.83 11.35 2.30 3.62 6.22 1.82
928x928 1.21 10.95 12.44 2.46 3.86 6.63 1.88
960x960 1.23 10.88 12.39 2.54 4.52 7.36 1.68
992x992 1.45 12.61 14.34 2.81 4.09 7.36 1.95
1024x1024 1.41 12.89 14.60 2.90 4.04 7.24 2.02
  • A simple E2E test with Qwen3VL-8B, using 2 2048x2048 pictures.
============================================== Before this PR ============================================
Request 1 with picture 1:
[2026-03-20 07:12:56] [QwenVLProcessor Perf] rid='a622c8dc55994f01b6f6596cbc45bf6d', load_time: 11.30 ms, preprocess_time: 0.00 ms, process_time: 285.03 ms, get_rope_index_time: 3.70 ms, total_time: 300.03 ms
[2026-03-20 07:13:04] Prefill batch, #new-seq: 1, #new-token: 4128, #cached-token: 0, token usage: 0.09, #running-req: 0, #queue-req: 0, cuda graph: False, input throughput (token/s): 0.00

图中展示的是一只电脑鼠标,从俯视角度拍摄。

它的主要特征如下:

- **外观设计**:鼠标整体呈流线型,表面为哑光黑色,带有细微的颗粒质感,显得非常现代和精致。
- **滚轮**:位于鼠标中央,是一个金属材质的滚轮,表面有清晰的环状纹理,便于手指抓握和操作。
- **结构**:鼠标左右两侧有弧形的侧裙,设计符合人体工学,旨在提供舒适的握持感。
- **背景与光线**:背景为纯黑色,突出了鼠标的轮廓和细节。光线从上方打下,在鼠标表面形成了柔和的高光,增强了立体感和质感。
- **底部**:鼠标底部可见网格状的防滑纹理,有助于在桌面上稳定放置。

右下角有“豆包AI生成”的水印,表明这张图片可能是由AI生成的。

总的来说,这是一张

Request 2 with picture 1 and 2:
[2026-03-20 07:13:13] [QwenVLProcessor Perf] rid='9360050de3284a6b8abf74974b09dd13', load_time: 8.10 ms, preprocess_time: 0.00 ms, process_time: 305.50 ms, get_rope_index_time: 0.65 ms, total_time: 314.25 ms
[2026-03-20 07:13:15] Prefill batch, #new-seq: 1, #new-token: 4128, #cached-token: 4096, token usage: 0.18, #running-req: 0, #queue-req: 0, cuda graph: False, input throughput (token/s): 356.51

根据您提供的两张图片,它们的共同点主要体现在以下几个方面:

1.  **核心主体相同**:两张图片展示的都是同一种款式的电脑鼠标。虽然颜色不同(一张是深灰色/黑色,另一张是白色),但它们的外形轮廓、人体工学设计、滚轮结构以及侧边按键布局都完全一致,可以判断是同一款产品的不同配色版本。

2.  **设计风格统一**:两款鼠标都采用了极简主义和现代的设计语言。线条流畅,表面光滑,整体造型圆润,没有多余的装饰,体现了简约、科技感的设计风格。

3.  **功能布局一致**:从图片中可以清晰地看到,两款鼠标都拥有一个位于中央的滚轮,滚轮下方有一个方形的按键(可能是前进/后退或功能键),两侧有用于拇指操作的侧键。这种布局是该款鼠标的标志性设计。

4.  **AI生成

============================================== After this PR ============================================
Request 1 with picture 1:
[2026-03-20 07:24:36] [QwenVLProcessor Perf] rid='9c6013f20584466aad6716abae3f2d41', load_time: 8.87 ms, preprocess_time: 0.00 ms, process_time: 261.03 ms, get_rope_index_time: 0.89 ms, total_time: 270.79 ms
[2026-03-20 07:24:43] Prefill batch, #new-seq: 1, #new-token: 4128, #cached-token: 0, token usage: 0.09, #running-req: 0, #queue-req: 0, cuda graph: False, input throughput (token/s): 0.00

图中展示的是一只电脑鼠标,从俯视角度拍摄。

它的主要特征如下:

- **外观设计**:鼠标整体呈流线型,表面为哑光黑色,带有细微的颗粒质感,显得非常现代和精致。
- **滚轮**:位于鼠标中央,是一个金属材质的滚轮,表面有清晰的环状纹理,便于手指抓握和操作。
- **结构**:鼠标左右两侧有弧形的侧裙,设计符合人体工学,旨在提供舒适的握持感。
- **背景与光线**:背景为纯黑色,突出了鼠标的轮廓和细节。光线从上方打下,在鼠标表面形成了柔和的高光,增强了立体感和质感。
- **底部**:鼠标底部可见网格状的防滑纹理,有助于在桌面上稳定放置。

右下角有“豆包AI生成”的水印,表明这张图片可能是由AI生成的。

总的来说,这是一张

Request 2 with picture 1 and 2:
[2026-03-20 07:24:52] [QwenVLProcessor Perf] rid='77ae5d49db6c4ce492ad8c7861e1462b', load_time: 6.68 ms, preprocess_time: 0.00 ms, process_time: 265.42 ms, get_rope_index_time: 0.67 ms, total_time: 272.76 ms
[2026-03-20 07:24:54] Prefill batch, #new-seq: 1, #new-token: 4128, #cached-token: 4096, token usage: 0.18, #running-req: 0, #queue-req: 0, cuda graph: False, input throughput (token/s): 357.54

根据您提供的两张图片,它们的共同点主要体现在以下几个方面:

1.  **核心主体相同**:两张图片展示的都是同一种款式的电脑鼠标。虽然颜色不同(一张是深灰色/黑色,另一张是白色),但它们的外形轮廓、人体工学设计、滚轮结构以及侧边按键布局都完全一致,可以判断是同一款产品的不同配色版本。

2.  **设计风格统一**:两款鼠标都采用了极简主义和现代的设计语言。线条流畅,表面光滑,整体造型圆润,没有多余的装饰,体现了简约、科技感的设计风格。

3.  **功能布局一致**:从图片中可以清晰地看到,两款鼠标都拥有一个位于中央的滚轮,滚轮下方有一个方形的按键(可能是前进/后退或功能键),两侧有用于拇指操作的侧键。这种布局是该款鼠标的标志性设计。

4.  **AI生成

Request 3 with picture 1 again:
[2026-03-20 07:25:04] [QwenVLProcessor Perf] rid='19084840f467424893ce55672b9a006f', load_time: 2.85 ms, preprocess_time: 0.00 ms, process_time: 124.14 ms, get_rope_index_time: 0.43 ms, total_time: 127.42 ms
[2026-03-20 07:25:05] Prefill batch, #new-seq: 1, #new-token: 32, #cached-token: 4096, token usage: 0.09, #running-req: 0, #queue-req: 0, cuda graph: False, input throughput (token/s): 391.67

图中展示的是一只电脑鼠标,从俯视角度拍摄。

它的主要特征如下:

- **外观设计**:鼠标整体呈流线型,表面为哑光黑色,带有细微的颗粒质感,显得非常现代和精致。
- **滚轮**:位于鼠标中央,是一个金属材质的滚轮,表面有清晰的环状纹理,便于手指抓握和操作。
- **结构**:鼠标左右两侧有弧形的侧裙,设计符合人体工学,旨在提供舒适的握持感。
- **背景与光线**:背景为纯黑色,突出了鼠标的轮廓和细节。光线从上方打下,在鼠标表面形成了柔和的高光,增强了立体感和质感。
- **底部**:鼠标底部可见网格状的防滑纹理,有助于在桌面上稳定放置。

右下角有“豆包AI生成”的水印,表明这张图片可能是由AI生成的。

总的来说,这是一张

Request 4 with picture 1 and 2 again:
[2026-03-20 07:25:14] [QwenVLProcessor Perf] rid='ce500eeccd4747da8fb3eeea191d76df', load_time: 4.26 ms, preprocess_time: 0.00 ms, process_time: 258.15 ms, get_rope_index_time: 0.67 ms, total_time: 263.08 ms
[2026-03-20 07:25:15] Prefill batch, #new-seq: 1, #new-token: 32, #cached-token: 8192, token usage: 0.18, #running-req: 0, #queue-req: 0, cuda graph: False, input throughput (token/s): 3.08

根据您提供的两张图片,它们的共同点主要体现在以下几个方面:

1.  **核心主体相同**:两张图片展示的都是同一种款式的电脑鼠标。虽然颜色不同(一张是深灰色/黑色,另一张是白色),但它们的外形轮廓、人体工学设计、滚轮结构以及侧边按键布局都完全一致,可以判断是同一款产品的不同配色版本。

2.  **设计风格统一**:两款鼠标都采用了极简主义和现代的设计语言。线条流畅,表面光滑,整体造型圆润,没有多余的装饰,体现了简约、科技感的设计风格。

3.  **功能布局一致**:从图片中可以清晰地看到,两款鼠标都拥有一个位于中央的滚轮,滚轮下方有一个方形的按键(可能是前进/后退或功能键),两侧有用于拇指操作的侧键。这种布局是该款鼠标的标志性设计。

4.  **AI生成

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@yuan-luo
Copy link
Copy Markdown
Collaborator

yuan-luo commented Mar 4, 2026

/tag-and-rerun-ci

@wili-65535
Copy link
Copy Markdown
Contributor Author

Hi maintainers, could you help me understand the CI failures? I'd like to address them to move this PR forward.

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

Hi maintainers, could you help me understand the CI failures? I'd like to address them to move this PR forward.

CI might be flaky; please rerun until all checks pass.

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@wili-65535
Copy link
Copy Markdown
Contributor Author

wili-65535 commented Mar 10, 2026

Hi @yhyang201 @yuan-luo I investigate the CI report and find some information.

registered/vlm/test_vision_openai_server_a.py

  • In our PR, when processing JPEG images on NVIDIA GPUs, function python/sglang/srt/utils/common.py::load_image() directly returns a torch GPU tensor instead of a PIL Image (here).
  • This works fine for most model workflows because the subsequent image processing is in transformers/src/transformers/image_processing_utils_fast.py::_process_image(), which accepts various input types including torch tensors, PIL Images, and numpy arrays (here).
  • However, MiniCPM models have their own image pre-processing script that only accepts PIL Image inputs and internally converts them to numpy arrays using the .numpy() method (see code).
  • This incompatibility causes the following error when running MiniCPM models on NVIDIA GPUs (raised here):
openai.InternalServerError: Error code: 500 - {'object': 'error', 'message': "Internal server error: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.", 'type': 'InternalServerError', 'param': None, 'code': 500}
  • In our testing, only MiniCPM-o-2_6 and MiniCPM-V-4 exhibit this issue. Should we implement any special handling in the unit tests for these specific models?

registered/lora/test_multi_lora_backend.py

Error information as below, reproduced stably.

KeyError: '/loky-7341-yz54xt5a'

xpu/test_intel_xpu_backend.py

Error information as below, reproduced stably.

AttributeError: module 'torch.xpu' has no attribute 'graph_pool_handle'

Some other tests:

Error information as below, but I don‘t what it means.

Error: Unhandled error: HttpError: <!DOCTYPE html>

@wili-65535 wili-65535 force-pushed the wili/jpeg-preprocess branch from 8ba3ec5 to d594058 Compare March 10, 2026 08:55
@wili-65535
Copy link
Copy Markdown
Contributor Author

For the fail unit tests of MiniCPM-o-2_6 and MiniCPM-V-4, we have several solutions:

  1. Fix in MiniCPM's Huggingface code (these tests pass after fixing):
    • Change here from image = image.numpy() to image = image.cpu().numpy();
    • Change here from if isinstance(images, Image.Image): to if isinstance(images, (Image.Image, torch.Tensor)):
    • Change here from elif isinstance(images[0], Image.Image): to elif isinstance(images[0], (Image.Image, torch.Tensor)):
  2. Skip the tests on NVIDIA GPU?
  3. Add a switch to turn off the optimization in this PR when using those models?

if discard_alpha_channel and img.mode != "RGB":
if (
discard_alpha_channel
and img.mode != "RGB"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also need a small adjustment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Copy Markdown
Collaborator

@yhyang201 yhyang201 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we should still check the not isinstance(img, torch.Tensor) first ?

Copy link
Copy Markdown
Contributor Author

@wili-65535 wili-65535 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe I lost the commit... fixed now.
By the way, in the encode_server.py, we cannot figure out the model kind (unless search name from self.server_args) easily.
So the gpu_image_decode is disabled by default here.

@yhyang201
Copy link
Copy Markdown
Collaborator

For the fail unit tests of MiniCPM-o-2_6 and MiniCPM-V-4, we have several solutions:

  1. Fix in MiniCPM's Huggingface code (these tests pass after fixing):

    • Change here from image = image.numpy() to image = image.cpu().numpy();
    • Change here from if isinstance(images, Image.Image): to if isinstance(images, (Image.Image, torch.Tensor)):
    • Change here from elif isinstance(images[0], Image.Image): to elif isinstance(images[0], (Image.Image, torch.Tensor)):
  2. Skip the tests on NVIDIA GPU?

  3. Add a switch to turn off the optimization in this PR when using those models?

You might consider option 3.

Some processors may only accept PIL images, so one possible approach is to add a switch to disable GPU image decoding for those models.

For example (just a quick idea, not very well thought through, just for reference):

# base_processor.py
class BaseMultimodalProcessor(ABC):
    gpu_image_decode = True  # Enable GPU decoding by default
    ...

    @staticmethod
    def _load_single_item(data, modality, ..., gpu_image_decode=True):
        if modality == Modality.IMAGE:
            img, _ = load_image(data, use_gpu=gpu_image_decode)
            ...

Then incompatible models could simply turn it off:

# minicpm.py
class MiniCPMMultimodalProcessor(BaseMultimodalProcessor):
    gpu_image_decode = False  # MiniCPM HF processor does not support tensor inputs

Just a quick thought for reference.

Also, llava.py appears to call load_image() as well, so it might be worth checking whether the same adjustment is needed there.

@wili-65535
Copy link
Copy Markdown
Contributor Author

You might consider option 3.

Some processors may only accept PIL images, so one possible approach is to add a switch to disable GPU image decoding for those models.

...

Also, llava.py appears to call load_image() as well, so it might be worth checking whether the same adjustment is needed there.

Good idea, let's try about this.
In addition, maybe AMD GPU can benefit from this switch in the future, too (https://lmsys.org/blog/2026-02-11-Qwen-latency/#221-image-decoding-optimization-with-rocjpeg).

@wili-65535 wili-65535 force-pushed the wili/jpeg-preprocess branch 2 times, most recently from 24bbdf0 to 798f0f1 Compare March 11, 2026 01:12
@wili-65535
Copy link
Copy Markdown
Contributor Author

@yhyang201 I manage to add the switch, could you help us to review it at your convenience?

@samuellees
Copy link
Copy Markdown
Contributor

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

It seems like this change may affect InternVL2.5 and KimiVL.
In CI, KimiVL fails at w, h = image.size, which suggests the processor might be receiving a tensor/array-like object instead of a PIL image. InternVL2.5 tests also regress in the same run, possibly due to a similar image input type issue.

@wili-65535
Copy link
Copy Markdown
Contributor Author

It seems like this change may affect InternVL2.5 and KimiVL. In CI, KimiVL fails at w, h = image.size, which suggests the processor might be receiving a tensor/array-like object instead of a PIL image. InternVL2.5 tests also regress in the same run, possibly due to a similar image input type issue.

Code of these two models are fixed.

@samuellees
Copy link
Copy Markdown
Contributor

samuellees commented Mar 14, 2026

/rerun-failed-ci again

@wili-65535
Copy link
Copy Markdown
Contributor Author

The result of GPQA tests is updated in the description. Could we move forward the PR?

@yhyang201
Copy link
Copy Markdown
Collaborator

Let me see what exactly is wrong with CI.

@yhyang201
Copy link
Copy Markdown
Collaborator

I’ll rebase and see if the CI passes.

max_dynamic_patch: Optional[int] = None


image_extension_names = (".png", ".jpg", ".jpeg", ".webp", ".gif")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a mm_utils.py in this folder after this PR
cc @yhyang201

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

@mickqian
Copy link
Copy Markdown
Collaborator

great work. do we have e2e compare results BTW?

@wili-65535
Copy link
Copy Markdown
Contributor Author

wili-65535 commented Mar 20, 2026

great work. do we have e2e compare results BTW?

Thank you for your attention! @mickqian
We only run the mmmu_val task in lm_eval (shown as description).
What other end-to-end tests do you suggest we to add?

Furthermore, I add a result of E2E simple test with Qwen3VL-8B in the description.

@mickqian
Copy link
Copy Markdown
Collaborator

@wili-65535 I'm thinking we might need performance statistics on e2e benchmarks for this pr, you could check bench_serving.py or mmmu folder

v0.2: fix CI error

v2.0: add gpu_image_decode

v2.1: fix in encode_server.py

v2.2: fix more models
@wili-65535 wili-65535 force-pushed the wili/jpeg-preprocess branch from 1867b27 to af32299 Compare March 25, 2026 03:23
@yhyang201
Copy link
Copy Markdown
Collaborator

yhyang201 commented Mar 27, 2026

Use the a tool to conduct a latency test on Qwen3-VL-8B-Instruct (tp=1) with a single request while progressively increasing the number of images.

Each request contains N images of the same resolution (with N increasing from 1 to 32), a text input length of 256 tokens, and an output length of 32 tokens. The timeout for each individual request is set to 300 seconds.

Tests are conducted independently at three resolutions: 720p, 1080p, and 1440×2560. The server is restarted whenever switching resolutions. This setup is used to observe how the response time of a single request changes as the number of images increases.

For full experimental details, please refer to:
https://github.com/yhyang201/sgl-bench/tree/main/records/20260327/20260327_042330_qwen3_vl_8b_max_image_count_probe_1-32

main:

============================================================
Probing: 720p (1280x720)
============================================================
  Warmup: sending 3 requests (1~3 images)...
    warmup [1 images] ok
    warmup [2 images] ok
    warmup [3 images] ok
  Warmup done.

  [1 images] Generating 1x 720p... (0.0s) Sending... OK (TTFT=152ms, e2e=0.3s)
  [2 images] Generating 2x 720p... (0.0s) Sending... OK (TTFT=288ms, e2e=0.5s)
  [3 images] Generating 3x 720p... (0.1s) Sending... OK (TTFT=423ms, e2e=0.6s)
  [4 images] Generating 4x 720p... (0.1s) Sending... OK (TTFT=647ms, e2e=0.8s)
  [5 images] Generating 5x 720p... (0.1s) Sending... OK (TTFT=710ms, e2e=0.9s)
  [6 images] Generating 6x 720p... (0.1s) Sending... OK (TTFT=842ms, e2e=1.0s)
  [7 images] Generating 7x 720p... (0.1s) Sending... OK (TTFT=1137ms, e2e=1.3s)
  [8 images] Generating 8x 720p... (0.1s) Sending... OK (TTFT=1104ms, e2e=1.3s)
  [9 images] Generating 9x 720p... (0.2s) Sending... OK (TTFT=1263ms, e2e=1.5s)
  [10 images] Generating 10x 720p... (0.2s) Sending... OK (TTFT=1509ms, e2e=1.7s)
  [11 images] Generating 11x 720p... (0.2s) Sending... OK (TTFT=1661ms, e2e=1.9s)
  [12 images] Generating 12x 720p... (0.2s) Sending... OK (TTFT=1797ms, e2e=2.0s)
  [13 images] Generating 13x 720p... (0.2s) Sending... OK (TTFT=2308ms, e2e=2.5s)
  [14 images] Generating 14x 720p... (0.3s) Sending... OK (TTFT=2147ms, e2e=2.4s)
  [15 images] Generating 15x 720p... (0.3s) Sending... OK (TTFT=2296ms, e2e=2.5s)
  [16 images] Generating 16x 720p... (0.3s) Sending... OK (TTFT=2443ms, e2e=2.7s)
  [17 images] Generating 17x 720p... (0.3s) Sending... OK (TTFT=2622ms, e2e=2.8s)
  [18 images] Generating 18x 720p... (0.3s) Sending... OK (TTFT=2796ms, e2e=3.0s)
  [19 images] Generating 19x 720p... (0.3s) Sending... OK (TTFT=3182ms, e2e=3.4s)
  [20 images] Generating 20x 720p... (0.4s) Sending... OK (TTFT=3411ms, e2e=3.6s)
  [21 images] Generating 21x 720p... (0.4s) Sending... OK (TTFT=3523ms, e2e=3.8s)


============================================================
Probing: 1080p (1920x1080)
============================================================
  Warmup: sending 3 requests (1~3 images)...
    warmup [1 images] ok
    warmup [2 images] ok
    warmup [3 images] ok
  Warmup done.

  [1 images] Generating 1x 1080p... (0.0s) Sending... OK (TTFT=334ms, e2e=0.5s)
  [2 images] Generating 2x 1080p... (0.1s) Sending... OK (TTFT=656ms, e2e=0.8s)
  [3 images] Generating 3x 1080p... (0.1s) Sending... OK (TTFT=984ms, e2e=1.2s)
  [4 images] Generating 4x 1080p... (0.2s) Sending... OK (TTFT=1328ms, e2e=1.5s)
  [5 images] Generating 5x 1080p... (0.2s) Sending... OK (TTFT=1861ms, e2e=2.1s)
  [6 images] Generating 6x 1080p... (0.3s) Sending... OK (TTFT=2578ms, e2e=2.8s)
  [7 images] Generating 7x 1080p... (0.3s) Sending... OK (TTFT=2635ms, e2e=2.8s)
  [8 images] Generating 8x 1080p... (0.3s) Sending... OK (TTFT=3033ms, e2e=3.2s)
  [9 images] Generating 9x 1080p... (0.4s) Sending... OK (TTFT=3778ms, e2e=4.0s)
  [10 images] Generating 10x 1080p... (0.5s) Sending... OK (TTFT=4195ms, e2e=4.4s)
  [11 images] Generating 11x 1080p... (0.5s) Sending... OK (TTFT=5252ms, e2e=5.5s)
  [12 images] Generating 12x 1080p... (0.6s) Sending... OK (TTFT=5090ms, e2e=5.3s)
  [13 images] Generating 13x 1080p... (0.6s) Sending... OK (TTFT=6059ms, e2e=6.3s)
  [14 images] Generating 14x 1080p... (0.6s) Sending... OK (TTFT=6529ms, e2e=6.8s)
  [15 images] Generating 15x 1080p... (0.7s) Sending... OK (TTFT=7034ms, e2e=7.3s)
  [16 images] Generating 16x 1080p... (0.7s) Sending... OK (TTFT=7649ms, e2e=7.9s)
  [17 images] Generating 17x 1080p... (0.7s) Sending... OK (TTFT=8805ms, e2e=9.1s)
  [18 images] Generating 18x 1080p... (0.8s) Sending... OK (TTFT=9370ms, e2e=9.7s)
  [19 images] Generating 19x 1080p... (0.8s) Sending... OK (TTFT=9868ms, e2e=10.2s)
  [20 images] Generating 20x 1080p... (0.9s) Sending... OK (TTFT=10508ms, e2e=10.8s)
  [21 images] Generating 21x 1080p... (1.0s) Sending... OK (TTFT=11798ms, e2e=12.1s)
  [22 images] Generating 22x 1080p... (1.0s) Sending... OK (TTFT=13687ms, e2e=14.0s)
  [23 images] Generating 23x 1080p... (1.1s) Sending... OK (TTFT=13017ms, e2e=13.3s)
  [24 images] Generating 24x 1080p... (1.1s) Sending... OK (TTFT=13764ms, e2e=14.1s)
  [25 images] Generating 25x 1080p... (1.2s) Sending... OK (TTFT=15322ms, e2e=15.7s)
  [26 images] Generating 26x 1080p... (1.2s) Sending... OK (TTFT=15923ms, e2e=16.3s)
  [27 images] Generating 27x 1080p... (1.3s) Sending... OK (TTFT=16645ms, e2e=17.0s)
  [28 images] Generating 28x 1080p... (1.3s) Sending... OK (TTFT=17308ms, e2e=17.7s)
  [29 images] Generating 29x 1080p... (1.3s) Sending... OK (TTFT=19003ms, e2e=19.4s)


============================================================
Probing: 1440x2560 (2560x1440)
============================================================
  Warmup: sending 3 requests (1~3 images)...
    warmup [1 images] ok
    warmup [2 images] ok
    warmup [3 images] ok
  Warmup done.

  [1 images] Generating 1x 1440x2560... (0.1s) Sending... OK (TTFT=608ms, e2e=0.8s)
  [2 images] Generating 2x 1440x2560... (0.2s) Sending... OK (TTFT=1206ms, e2e=1.4s)
  [3 images] Generating 3x 1440x2560... (0.2s) Sending... OK (TTFT=2125ms, e2e=2.3s)
  [4 images] Generating 4x 1440x2560... (0.3s) Sending... OK (TTFT=3148ms, e2e=3.4s)
  [5 images] Generating 5x 1440x2560... (0.4s) Sending... OK (TTFT=4076ms, e2e=4.3s)
  [6 images] Generating 6x 1440x2560... (0.5s) Sending... OK (TTFT=4921ms, e2e=5.2s)
  [7 images] Generating 7x 1440x2560... (0.6s) Sending... OK (TTFT=7037ms, e2e=7.3s)
  [8 images] Generating 8x 1440x2560... (0.7s) Sending... OK (TTFT=7432ms, e2e=7.7s)
  [9 images] Generating 9x 1440x2560... (0.7s) Sending... OK (TTFT=8358ms, e2e=8.6s)
  [10 images] Generating 10x 1440x2560... (0.8s) Sending... OK (TTFT=10356ms, e2e=10.7s)
  [11 images] Generating 11x 1440x2560... (0.9s) Sending... OK (TTFT=11502ms, e2e=11.8s)
  [12 images] Generating 12x 1440x2560... (1.0s) Sending... OK (TTFT=13678ms, e2e=14.0s)
  [13 images] Generating 13x 1440x2560... (1.1s) Sending... OK (TTFT=16127ms, e2e=16.4s)
  [14 images] Generating 14x 1440x2560... (1.2s) Sending... OK (TTFT=17453ms, e2e=17.8s)

@yhyang201
Copy link
Copy Markdown
Collaborator

This pr:

============================================================
Probing: 720p (1280x720)
============================================================
  Warmup: sending 3 requests (1~3 images)...
    warmup [1 images] ok
    warmup [2 images] ok
    warmup [3 images] ok
  Warmup done.

  [1 images] Generating 1x 720p... (0.0s) Sending... OK (TTFT=145ms, e2e=0.3s)
  [2 images] Generating 2x 720p... (0.0s) Sending... OK (TTFT=274ms, e2e=0.4s)
  [3 images] Generating 3x 720p... (0.1s) Sending... OK (TTFT=394ms, e2e=0.6s)
  [4 images] Generating 4x 720p... (0.1s) Sending... OK (TTFT=620ms, e2e=0.8s)
  [5 images] Generating 5x 720p... (0.1s) Sending... OK (TTFT=662ms, e2e=0.8s)
  [6 images] Generating 6x 720p... (0.1s) Sending... OK (TTFT=786ms, e2e=1.0s)
  [7 images] Generating 7x 720p... (0.1s) Sending... OK (TTFT=1088ms, e2e=1.3s)
  [8 images] Generating 8x 720p... (0.1s) Sending... OK (TTFT=1044ms, e2e=1.2s)
  [9 images] Generating 9x 720p... (0.2s) Sending... OK (TTFT=1196ms, e2e=1.4s)
  [10 images] Generating 10x 720p... (0.2s) Sending... OK (TTFT=1444ms, e2e=1.6s)
  [11 images] Generating 11x 720p... (0.2s) Sending... OK (TTFT=1589ms, e2e=1.8s)
  [12 images] Generating 12x 720p... (0.2s) Sending... OK (TTFT=1717ms, e2e=1.9s)
  [13 images] Generating 13x 720p... (0.2s) Sending... OK (TTFT=2198ms, e2e=2.4s)
  [14 images] Generating 14x 720p... (0.3s) Sending... OK (TTFT=2017ms, e2e=2.2s)
  [15 images] Generating 15x 720p... (0.3s) Sending... OK (TTFT=2165ms, e2e=2.4s)
  [16 images] Generating 16x 720p... (0.3s) Sending... OK (TTFT=2327ms, e2e=2.5s)
  [17 images] Generating 17x 720p... (0.3s) Sending... OK (TTFT=2491ms, e2e=2.7s)
  [18 images] Generating 18x 720p... (0.3s) Sending... OK (TTFT=2619ms, e2e=2.8s)
  [19 images] Generating 19x 720p... (0.3s) Sending... OK (TTFT=3037ms, e2e=3.3s)
  [20 images] Generating 20x 720p... (0.4s) Sending... OK (TTFT=3217ms, e2e=3.4s)
  [21 images] Generating 21x 720p... (0.4s) Sending... OK (TTFT=3388ms, e2e=3.6s)
  [22 images] Generating 22x 720p... (0.4s) Sending... OK (TTFT=3558ms, e2e=3.8s)
  [23 images] Generating 23x 720p... (0.4s) Sending... OK (TTFT=3721ms, e2e=4.0s)
  [24 images] Generating 24x 720p... (0.4s) Sending... OK (TTFT=3888ms, e2e=4.1s)
  [25 images] Generating 25x 720p... (0.5s) Sending... OK (TTFT=4681ms, e2e=4.9s)
  [26 images] Generating 26x 720p... (0.5s) Sending... OK (TTFT=4338ms, e2e=4.6s)
  [27 images] Generating 27x 720p... (0.5s) Sending... OK (TTFT=4400ms, e2e=4.6s)
  [28 images] Generating 28x 720p... (0.5s) Sending... OK (TTFT=4945ms, e2e=5.2s)
  [29 images] Generating 29x 720p... (0.5s) Sending... OK (TTFT=5115ms, e2e=5.4s)
  [30 images] Generating 30x 720p... (0.6s) Sending... OK (TTFT=5310ms, e2e=5.6s)
  [31 images] Generating 31x 720p... (0.6s) Sending... OK (TTFT=5499ms, e2e=5.8s)
  [32 images] Generating 32x 720p... (0.6s) Sending... OK (TTFT=5696ms, e2e=6.0s)


============================================================
Probing: 1080p (1920x1080)
============================================================
  Warmup: sending 3 requests (1~3 images)...
    warmup [1 images] ok
    warmup [2 images] ok
    warmup [3 images] ok
  Warmup done.

  [1 images] Generating 1x 1080p... (0.0s) Sending... OK (TTFT=319ms, e2e=0.5s)
  [2 images] Generating 2x 1080p... (0.1s) Sending... OK (TTFT=620ms, e2e=0.8s)
  [3 images] Generating 3x 1080p... (0.1s) Sending... OK (TTFT=928ms, e2e=1.1s)
  [4 images] Generating 4x 1080p... (0.2s) Sending... OK (TTFT=1270ms, e2e=1.5s)
  [5 images] Generating 5x 1080p... (0.2s) Sending... OK (TTFT=1764ms, e2e=2.0s)
  [6 images] Generating 6x 1080p... (0.3s) Sending... OK (TTFT=2476ms, e2e=2.7s)
  [7 images] Generating 7x 1080p... (0.3s) Sending... OK (TTFT=2501ms, e2e=2.7s)
  [8 images] Generating 8x 1080p... (0.4s) Sending... OK (TTFT=2899ms, e2e=3.1s)
  [9 images] Generating 9x 1080p... (0.4s) Sending... OK (TTFT=3595ms, e2e=3.8s)
  [10 images] Generating 10x 1080p... (0.5s) Sending... OK (TTFT=4053ms, e2e=4.3s)
  [11 images] Generating 11x 1080p... (0.5s) Sending... OK (TTFT=5092ms, e2e=5.3s)
  [12 images] Generating 12x 1080p... (0.6s) Sending... OK (TTFT=4930ms, e2e=5.2s)
  [13 images] Generating 13x 1080p... (0.6s) Sending... OK (TTFT=5876ms, e2e=6.1s)
  [14 images] Generating 14x 1080p... (0.6s) Sending... OK (TTFT=6330ms, e2e=6.6s)
  [15 images] Generating 15x 1080p... (0.7s) Sending... OK (TTFT=6809ms, e2e=7.1s)
  [16 images] Generating 16x 1080p... (0.7s) Sending... OK (TTFT=7419ms, e2e=7.7s)
  [17 images] Generating 17x 1080p... (0.8s) Sending... OK (TTFT=8593ms, e2e=8.9s)
  [18 images] Generating 18x 1080p... (0.8s) Sending... OK (TTFT=9080ms, e2e=9.4s)
  [19 images] Generating 19x 1080p... (0.8s) Sending... OK (TTFT=9621ms, e2e=9.9s)
  [20 images] Generating 20x 1080p... (0.9s) Sending... OK (TTFT=10225ms, e2e=10.5s)
  [21 images] Generating 21x 1080p... (1.0s) Sending... OK (TTFT=11544ms, e2e=11.9s)
  [22 images] Generating 22x 1080p... (1.0s) Sending... OK (TTFT=13289ms, e2e=13.6s)
  [23 images] Generating 23x 1080p... (1.0s) Sending... OK (TTFT=12652ms, e2e=13.0s)
  [24 images] Generating 24x 1080p... (1.0s) Sending... OK (TTFT=13362ms, e2e=13.7s)
  [25 images] Generating 25x 1080p... (1.1s) Sending... OK (TTFT=14938ms, e2e=15.3s)
  [26 images] Generating 26x 1080p... (1.2s) Sending... OK (TTFT=15568ms, e2e=15.9s)
  [27 images] Generating 27x 1080p... (1.2s) Sending... OK (TTFT=16179ms, e2e=16.5s)
  [28 images] Generating 28x 1080p... (1.3s) Sending... OK (TTFT=16904ms, e2e=17.3s)
  [29 images] Generating 29x 1080p... (1.3s) Sending... OK (TTFT=18674ms, e2e=19.0s)
  [30 images] Generating 30x 1080p... (1.3s) Sending... OK (TTFT=19359ms, e2e=19.7s)


============================================================
Probing: 1440x2560 (2560x1440)
============================================================
  Warmup: sending 3 requests (1~3 images)...
    warmup [1 images] ok
    warmup [2 images] ok
    warmup [3 images] ok
  Warmup done.

  [1 images] Generating 1x 1440x2560... (0.1s) Sending... OK (TTFT=579ms, e2e=0.8s)
  [2 images] Generating 2x 1440x2560... (0.2s) Sending... OK (TTFT=1151ms, e2e=1.3s)
  [3 images] Generating 3x 1440x2560... (0.3s) Sending... OK (TTFT=2029ms, e2e=2.2s)
  [4 images] Generating 4x 1440x2560... (0.3s) Sending... OK (TTFT=3032ms, e2e=3.2s)
  [5 images] Generating 5x 1440x2560... (0.4s) Sending... OK (TTFT=3917ms, e2e=4.1s)
  [6 images] Generating 6x 1440x2560... (0.5s) Sending... OK (TTFT=4746ms, e2e=5.0s)
  [7 images] Generating 7x 1440x2560... (0.6s) Sending... OK (TTFT=6823ms, e2e=7.1s)
  [8 images] Generating 8x 1440x2560... (0.7s) Sending... OK (TTFT=7181ms, e2e=7.4s)
  [9 images] Generating 9x 1440x2560... (0.7s) Sending... OK (TTFT=8164ms, e2e=8.4s)
  [10 images] Generating 10x 1440x2560... (0.8s) Sending... OK (TTFT=10054ms, e2e=10.3s)
  [11 images] Generating 11x 1440x2560... (0.9s) Sending... OK (TTFT=11166ms, e2e=11.5s)
  [12 images] Generating 12x 1440x2560... (1.0s) Sending... OK (TTFT=13351ms, e2e=13.7s)
  [13 images] Generating 13x 1440x2560... (1.1s) Sending... OK (TTFT=15695ms, e2e=16.0s)
  [14 images] Generating 14x 1440x2560... (1.2s) Sending... OK (TTFT=17002ms, e2e=17.3s)

  Result: 1440x2560 max = 14 images

@yhyang201
Copy link
Copy Markdown
Collaborator

This PR reduces TTFT by about 3–5% overall, with the most noticeable improvement (~5%) at 720p and smaller gains at higher resolutions.

@yhyang201
Copy link
Copy Markdown
Collaborator

All CI checks have passed — should we go ahead and merge?

@yhyang201 yhyang201 merged commit 5bb9ca0 into sgl-project:main Mar 29, 2026
578 of 664 checks passed
@wili-65535 wili-65535 deleted the wili/jpeg-preprocess branch March 30, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants