[XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation#30538
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a workaround for a Triton kernel compilation error on XPU devices during speculative decoding by setting the IGC_ForceOCLSIMDWidth environment variable to 16 in the XPU Dockerfile. While this resolves the compilation failure, setting this globally may have unintended performance consequences. My review includes a suggestion for a more targeted approach to apply this workaround only when necessary.
docker/Dockerfile.xpu
Outdated
| @@ -76,6 +76,9 @@ RUN python3 -m pip install -e tests/vllm_test_utils | |||
| ENV NIXL_VERSION=0.7.0 | |||
| RUN python3 /workspace/vllm/tools/install_nixl_from_source_ubuntu.py | |||
|
|
|||
| # decrease triton kernel compilation scratch space for speculative decoding | |||
| ENV IGC_ForceOCLSIMDWidth=16 | |||
There was a problem hiding this comment.
Setting IGC_ForceOCLSIMDWidth as a global environment variable in the Docker image is a broad change that will affect all Triton kernels compiled at runtime, not just those for speculative decoding. This could lead to performance degradation for workloads that do not use speculative decoding, or for other kernels that could benefit from a wider SIMD width.
A more targeted approach would be to set this environment variable dynamically within the vLLM Python code, only when speculative decoding is enabled on an XPU device. A suitable location for this logic could be within vllm.platforms.xpu.XPUPlatform.check_and_update_config, checking if vllm_config.speculative_config is present.
This would scope the workaround to only when it's needed, avoiding potential performance impacts on other use cases. While I cannot suggest code for an un-modified file, please consider this alternative for a more robust solution.
|
This pull request has merge conflicts that must be resolved before it can be |
53aecb8 to
8733724
Compare
8733724 to
853e839
Compare
|
Hi @yma11, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…ding Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
affd04e to
1fd4cda
Compare
|
Documentation preview: https://vllm--30538.org.readthedocs.build/en/30538/ |
1fd4cda to
522a3f3
Compare
522a3f3 to
ae350cd
Compare
jikunshang
left a comment
There was a problem hiding this comment.
LGTM, thanks for fixing
…xpu kernel compilation (vllm-project#30538) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
…xpu kernel compilation (vllm-project#30538) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…xpu kernel compilation (vllm-project#30538) Signed-off-by: Yan Ma <yan.ma@intel.com>
Purpose
decrease triton kernel compilation scratch space for speculative decoding, work around for error:
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.