[Intel GPU] Fix incorrect KV-cache page table for local attention when page_size > 1 by ckvermaAI · Pull Request #23757 · sgl-project/sglang

ckvermaAI · 2026-04-26T07:42:01Z

Motivation

Fixes incorrect KV-cache page table values passed to make_local_attention_virtual_batches in the XPU (Intel GPU) attention backend when page_size > 1 and local (chunked) attention is enabled. This bug caused incorrect/zeroed outputs from flash_attn_with_kvcache.

Root Cause

make_local_attention_virtual_batches expects a page-granularity block table where each column p stores the physical page index for logical page p. However, the raw req_to_token table is token-granularity (column i = KV slot for tokeni).

When page_size > 1, the un-strided token-granularity table was passed directly, causing block_starts = k_seqstarts_absolute // page_size to index incorrect physical page values.

Modifications

When page_size > 1, the page table is first converted to page-granularity by:

Striding: selecting every page_size-th column (torch.arange(0, ..., page_size))
Dividing: integer-dividing values by page_size to convert KV slot indices to physical page indices
This mirrors the normalization already applied to metadata.page_table elsewhere in the backend.

Changes
python/sglang/srt/layers/attention/xpu_backend.py: Add page table stride+divide normalization before passing to make_local_attention_virtual_batches when page_size > 1

Accuracy Tests

GSM8k benchmark on XPU with page_size > 1 and local attention enabled:

Metric	Before	After
Accuracy	0.005	0.815
Invalid	0.015	0.000

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

CI States

Latest PR Test (Base): Run #26011291231
Latest PR Test (Extra): ⚠️ Not enabled — add run-ci-extra label to opt in.

gemini-code-assist

Code Review

This pull request introduces logic to normalize the page table within the _init_local_attn_metadata function of the XPU backend, ensuring it is at page-granularity when page_size is greater than one. Feedback suggests refactoring this logic into a shared helper method to eliminate duplication with similar code found in init_forward_metadata.

gemini-code-assist · 2026-04-26T07:46:33Z

+        if self.page_size > 1:
+            strided_indices = torch.arange(
+                0, page_table.shape[1], self.page_size, device=page_table.device
+            )
+            page_table = page_table[:, strided_indices] // self.page_size


The normalization logic added here (striding and floor-dividing the page table) is identical to the logic at lines 371-376 in init_forward_metadata. While necessary here because _init_local_attn_metadata is called before that main normalization block, it would be cleaner to encapsulate this logic into a helper method or move the main normalization earlier in init_forward_metadata to avoid duplication and ensure consistency across the backend.

mingfeima · 2026-05-18T03:03:52Z

@sunjiweiswift please help review this one!

gemini-code-assist · 2026-05-18T03:04:18Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

mingfeima · 2026-05-18T05:36:49Z

we'd better add UT to cover radix attention level behavior, like this one:

test/registered/attention/test_triton_attention_backend.py
test/registered/attention/test_torch_native_attention_backend.py
test/registered/cpu/test_intel_amx_attention_backend_a.py

@ckvermaAI could you please take this job?

ckvermaAI · 2026-05-18T05:49:19Z

@mingfeima sure, Let me add similar unit test

sunjiweiswift · 2026-05-19T02:21:55Z

Let me test this on more models.

mingfeima · 2026-05-21T05:14:11Z

Let me test this on more models.

@sunjiweiswift any updates?

…n page_size > 1 (sgl-project#23757)

ckvermaAI and others added 2 commits April 26, 2026 10:01

Normalize page table values

6f2aa09

Merge branch 'main' into xpu_backend

d3fe4e4

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

Merge branch 'main' into xpu_backend

75497dd

mingfeima added intel xpu intel gpu with device `torch.xpu` run-ci labels May 18, 2026

mingfeima marked this pull request as ready for review May 18, 2026 03:04

mingfeima requested review from Fridge003, HaiShaw, Qiaolin-Yu, hebiao064, ispobock and merrymercy as code owners May 18, 2026 03:04

Merge branch 'main' into xpu_backend

191fcd9

jmunetong mentioned this pull request May 20, 2026

[XPU] Enable Gemma 4 E2B / E4B / 31B/ 26B-A4B on Intel XPU #23280

Open

4 tasks

mingfeima approved these changes May 26, 2026

View reviewed changes

mingfeima enabled auto-merge (squash) May 26, 2026 03:01

mingfeima disabled auto-merge May 26, 2026 03:02

mingfeima merged commit 156d1af into sgl-project:main May 26, 2026
112 of 121 checks passed

Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026

[Intel GPU] Fix incorrect KV-cache page table for local attention whe…

c1c739e

…n page_size > 1 (sgl-project#23757)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Intel GPU] Fix incorrect KV-cache page table for local attention when page_size > 1#23757

[Intel GPU] Fix incorrect KV-cache page table for local attention when page_size > 1#23757
mingfeima merged 4 commits into
sgl-project:mainfrom
ckvermaAI:xpu_backend

ckvermaAI commented Apr 26, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

mingfeima commented May 18, 2026

Uh oh!

gemini-code-assist Bot commented May 18, 2026

Uh oh!

mingfeima commented May 18, 2026

Uh oh!

ckvermaAI commented May 18, 2026

Uh oh!

sunjiweiswift commented May 19, 2026

Uh oh!

mingfeima commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ckvermaAI commented Apr 26, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

CI States

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

mingfeima commented May 18, 2026

Uh oh!

gemini-code-assist Bot commented May 18, 2026

Uh oh!

mingfeima commented May 18, 2026

Uh oh!

ckvermaAI commented May 18, 2026

Uh oh!

sunjiweiswift commented May 19, 2026

Uh oh!

mingfeima commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ckvermaAI commented Apr 26, 2026 •

edited by github-actions Bot

Loading