Skip to content

[XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation#30538

Merged
jikunshang merged 3 commits intovllm-project:mainfrom
yma11:spec-decode-wa
Dec 23, 2025
Merged

[XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation#30538
jikunshang merged 3 commits intovllm-project:mainfrom
yma11:spec-decode-wa

Conversation

@yma11
Copy link
Contributor

@yma11 yma11 commented Dec 12, 2025

Purpose

decrease triton kernel compilation scratch space for speculative decoding, work around for error:

L0 build module failed. Log:
warning: [RetryManager] Start recompilation of the kernel
in kernel: 'sample_recovered_tokens_kernel'

error: total scratch space exceeds HW supported limit for kernel sample_recovered_tokens_kernel: 1164736 bytes (max permitted PTSS 262144 bytes)
error: backend compiler failed build.

Error during Intel loadBinary: ZE_RESULT_ERROR_MODULE_BUILD_FAILURE

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@yma11 yma11 requested a review from jikunshang as a code owner December 12, 2025 06:05
@yma11 yma11 marked this pull request as draft December 12, 2025 06:05
@mergify mergify bot added the ci/build label Dec 12, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround for a Triton kernel compilation error on XPU devices during speculative decoding by setting the IGC_ForceOCLSIMDWidth environment variable to 16 in the XPU Dockerfile. While this resolves the compilation failure, setting this globally may have unintended performance consequences. My review includes a suggestion for a more targeted approach to apply this workaround only when necessary.

@@ -76,6 +76,9 @@ RUN python3 -m pip install -e tests/vllm_test_utils
ENV NIXL_VERSION=0.7.0
RUN python3 /workspace/vllm/tools/install_nixl_from_source_ubuntu.py

# decrease triton kernel compilation scratch space for speculative decoding
ENV IGC_ForceOCLSIMDWidth=16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting IGC_ForceOCLSIMDWidth as a global environment variable in the Docker image is a broad change that will affect all Triton kernels compiled at runtime, not just those for speculative decoding. This could lead to performance degradation for workloads that do not use speculative decoding, or for other kernels that could benefit from a wider SIMD width.

A more targeted approach would be to set this environment variable dynamically within the vLLM Python code, only when speculative decoding is enabled on an XPU device. A suitable location for this logic could be within vllm.platforms.xpu.XPUPlatform.check_and_update_config, checking if vllm_config.speculative_config is present.

This would scope the workaround to only when it's needed, avoiding potential performance impacts on other use cases. While I cannot suggest code for an un-modified file, please consider this alternative for a more robust solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. updated.

@mergify
Copy link

mergify bot commented Dec 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yma11.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 16, 2025
@yma11 yma11 marked this pull request as ready for review December 20, 2025 10:35
@mergify mergify bot removed the needs-rebase label Dec 20, 2025
@mergify
Copy link

mergify bot commented Dec 22, 2025

Hi @yma11, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…ding

Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
@mergify
Copy link

mergify bot commented Dec 23, 2025

Documentation preview: https://vllm--30538.org.readthedocs.build/en/30538/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 23, 2025
Signed-off-by: Yan Ma <yan.ma@intel.com>
Copy link
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing

@jikunshang jikunshang enabled auto-merge (squash) December 23, 2025 03:09
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 23, 2025
@jikunshang jikunshang merged commit f1c2c20 into vllm-project:main Dec 23, 2025
45 checks passed
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
…xpu kernel compilation (vllm-project#30538)

Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…xpu kernel compilation (vllm-project#30538)

Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants