Address compatibility issues arising from the removal of the XLA dependency by weiyu0824 · Pull Request #1423 · vllm-project/tpu-inference

weiyu0824 · 2026-01-08T13:31:42Z

Description

This PR addresses compatibility issues arising from the removal of the XLA dependency in upstream vLLM (reference.

Changes:

Moved PallasAttention static function from vllm to tpu-inference.
Update TpuPlatform in tpu_platform.py to use local PallasAttentionBackend.
Add create_weights in compressed_tensors_w8a8_int8.py because vllm's implementation would introduce another error.
Register PallasAttention using decorator (see line:32 and line:37 in tpu_inference/layers/vllm/attention.py) because we already remove Pallas enum in upstream.

Tests

buildkite result

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

github-actions · 2026-01-08T13:35:16Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a Github issue, please include a link, e.g.,:
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

QiliangCui

can you explain more about why

Add create_weights in compressed_tensors_w8a8_int8.py because vllm's implementation would introduce another error.

tpu_inference/layers/vllm/attention.py

tpu_inference/platforms/tpu_platform.py

...nference/layers/vllm/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py

docker/Dockerfile

kyuyeunk · 2026-01-09T03:27:32Z

seems like ci is failing due to upstream change: vllm-project/vllm#30519

I think this should be a simple fix. @weiyu0824 can you add a fix commit to this branch as well?

tpu_inference/layers/vllm/attention.py

tpu_inference/platforms/tpu_platform.py

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

kyuyeunk · 2026-01-09T06:49:42Z

hmmm we have another error due to upstream change 1 hour ago: vllm-project/vllm#32003

yeah, feel free to ignore all my other comments and let's just try to get this pr merged asap.

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

This reverts commit fece762. Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

weiyu0824 requested review from hfan, kyuyeunk, mrjunwan-lang, sixiang-google, vanbasten23 and vipannalla as code owners January 8, 2026 13:31

weiyu0824 changed the title ~~Weiyu/test vllm change~~ Addresses compatibility issues arising from the removal of the XLA dependency Jan 8, 2026

weiyu0824 changed the title ~~Addresses compatibility issues arising from the removal of the XLA dependency~~ Address compatibility issues arising from the removal of the XLA dependency Jan 8, 2026

QiliangCui reviewed Jan 8, 2026

View reviewed changes

weiyu0824 requested a review from jrplatin as a code owner January 8, 2026 17:07

vanbasten23 reviewed Jan 8, 2026

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

weiyu0824 force-pushed the weiyu/test-vllm-change branch from 54c01b2 to 38f8531 Compare January 9, 2026 02:14

weiyu0824 requested a review from jcyang43 as a code owner January 9, 2026 02:14

weiyu0824 force-pushed the weiyu/test-vllm-change branch from 5102888 to b3c04e9 Compare January 9, 2026 02:41

vanbasten23 approved these changes Jan 9, 2026

View reviewed changes

weiyu0824 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 9, 2026

weiyu0824 force-pushed the weiyu/test-vllm-change branch from b3c04e9 to bce6db3 Compare January 9, 2026 03:34

kyuyeunk reviewed Jan 9, 2026

View reviewed changes

tpu_inference/layers/vllm/attention.py Outdated Show resolved Hide resolved

kyuyeunk reviewed Jan 9, 2026

View reviewed changes

tpu_inference/platforms/tpu_platform.py Outdated Show resolved Hide resolved

weiyu0824 force-pushed the weiyu/test-vllm-change branch from 8ff7761 to 5977a14 Compare January 9, 2026 06:32

weiyu0824 added 8 commits January 9, 2026 06:33

Migrate PallasAttention static functions from vllm to tpu-inference

0b35fbc

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Add create_weight function as vllm implementation is incompatible

514c1f3

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Fix unit test issue

bd94aa3

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Unpin vLLM version

2b0f548

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Add back logging information

77cdf5b

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Fix precommit format error

08b855d

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Add router parameter for compatability (vLLM PR 30519)

775325b

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

Make tpu-inference override FLASH_ATTN kernel instead of CUSTOM

ec15832

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

weiyu0824 force-pushed the weiyu/test-vllm-change branch from 5977a14 to ec15832 Compare January 9, 2026 06:35

Fix upstream compatibility error (pr 32003)

80c8007

Signed-off-by: Wei-Yu Lin <weiyulin@google.com>

weiyu0824 requested a review from wenxindongwork as a code owner January 9, 2026 07:36

kyuyeunk requested a review from Lumosis as a code owner January 9, 2026 08:52

more fix

e5deac5

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

kyuyeunk force-pushed the weiyu/test-vllm-change branch from 4d84445 to e5deac5 Compare January 9, 2026 08:52

Revert "Use local devices when computing hbm usage stats (#1419)"

fca1fe1

This reverts commit fece762. Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

kyuyeunk force-pushed the weiyu/test-vllm-change branch 4 times, most recently from 77c71d5 to 90480a1 Compare January 9, 2026 09:32

kyuyeunk added 2 commits January 9, 2026 09:53

more fixes

32034fa

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

fix registering attention

a3c21c1

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

kyuyeunk force-pushed the weiyu/test-vllm-change branch 2 times, most recently from 317a769 to 4719933 Compare January 9, 2026 10:41

modify spec decoding sleep time

9fbda58

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

kyuyeunk force-pushed the weiyu/test-vllm-change branch from 4719933 to 9fbda58 Compare January 9, 2026 10:59

kyuyeunk merged commit 67b2082 into main Jan 9, 2026
39 checks passed

kyuyeunk deleted the weiyu/test-vllm-change branch January 30, 2026 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address compatibility issues arising from the removal of the XLA dependency #1423

Address compatibility issues arising from the removal of the XLA dependency #1423
kyuyeunk merged 14 commits intomainfrom
weiyu/test-vllm-change

weiyu0824 commented Jan 8, 2026

Uh oh!

github-actions bot commented Jan 8, 2026

Uh oh!

QiliangCui left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyuyeunk commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

kyuyeunk commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

weiyu0824 commented Jan 8, 2026

Description

Tests

Checklist

Uh oh!

github-actions bot commented Jan 8, 2026

Description

Tests

Checklist

Uh oh!

QiliangCui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyuyeunk commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

kyuyeunk commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants