Skip to content

[CPU][Fix CI] Solidate torch version for sgl-kernel-cpu and fix device orientation error#17460

Merged
Fridge003 merged 11 commits intosgl-project:mainfrom
ZailiWang:fix-torch-ver
Jan 22, 2026
Merged

[CPU][Fix CI] Solidate torch version for sgl-kernel-cpu and fix device orientation error#17460
Fridge003 merged 11 commits intosgl-project:mainfrom
ZailiWang:fix-torch-ver

Conversation

@ZailiWang
Copy link
Contributor

Motivation

In sgl-kernel/pyproject_cpu.toml the requirement of PyTorch version is >=2.7.1. However this triggers the installation of the latest torch==2.10.0, leading to symbol errors for build output, as sglang is built with torch==2.9.0.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/.venv/lib/python3.12/site-packages/sgl_kernel/__init__.py", line 5, in <module>
    common_ops = _load_architecture_specific_ops()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/sgl_kernel/load_utils.py", line 188, in _load_architecture_specific_ops
    raise ImportError(error_msg)
ImportError: 
[sgl_kernel] CRITICAL: Could not load any common_ops library!

Attempted locations:
1. Architecture-specific pattern: /opt/.venv/lib/python3.12/site-packages/sgl_kernel/sm100/common_ops.* - found files: []
2. Fallback pattern: /opt/.venv/lib/python3.12/site-packages/sgl_kernel/common_ops.* - found files: ['/opt/.venv/lib/python3.12/site-packages/sgl_kernel/common_ops.cpython-312-x86_64-linux-gnu.so']
3. Standard Python import: common_ops - failed

GPU Info:
- Compute capability: None
- Expected variant: CPU/No GPU detected (using precise math)

Please ensure sgl_kernel is properly installed with:
pip install --upgrade sgl_kernel

Error details from previous import attempts:
- ImportError: /opt/.venv/lib/python3.12/site-packages/sgl_kernel/common_ops.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c1013MessageLogger6streamB5cxx11Ev
- ModuleNotFoundError: No module named 'common_ops'

Modifications

Solidate torch version in sgl-kernel/pyproject_cpu.toml as the same with sglang dependency.

Accuracy Tests

N/A

Benchmarking and Profiling

N/A

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added dependencies Pull requests that update a dependency file sgl-kernel labels Jan 21, 2026
@ZailiWang
Copy link
Contributor Author

/tag-run-ci-label

@ZailiWang ZailiWang requested a review from ch-wan as a code owner January 21, 2026 06:22
@ZailiWang
Copy link
Contributor Author

Fix another CI error introduced by #11657 in which a 'cuda' device was hard-coded.
The fix is to put forward the device orientation judgement code block and change 'cuda' to the deducted device name.

@ZailiWang ZailiWang changed the title [CPU][Fix CI] Solidate torch version for sgl-kernel-cpu [CPU][Fix CI] Solidate torch version for sgl-kernel-cpu and fix device orientation error Jan 21, 2026
@ZailiWang
Copy link
Contributor Author

Hi @ch-wan @ympcMark @ShangmingCai would you help review the device orientation fix? Current PR test shows that the CI failure on CPU/XPU can be fixed, but please confirm if the fix is still suitable for the feature in #11657 for CUDA as well as other devices. Thanks

@ZailiWang
Copy link
Contributor Author

Added some necessary changes in .toml files, for CPU specific wheel file publishment on PyPI.

  • The -cpu suffix is added into the output wheel package names for sglang and sgl-kernel.
  • The scikit-build.wheel.packages is explicitly set as the wheel name and package folder name are different.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 22, 2026
@ZailiWang
Copy link
Contributor Author

Add torchao installation as it is a mandatory requirement. Also added torchaudio installation originally in #15859 to avoid conflict.

@Fridge003 Fridge003 merged commit 672eb37 into sgl-project:main Jan 22, 2026
26 of 78 checks passed
@ZailiWang ZailiWang deleted the fix-torch-ver branch January 22, 2026 06:08
Fridge003 pushed a commit that referenced this pull request Jan 22, 2026
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation high priority intel run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants