ci: run setup_ld_library_path before install_sglang_kernel by alisonshao · Pull Request #24141 · sgl-project/sglang

alisonshao · 2026-04-30T10:07:27Z

Summary

nvidia-cusparselt-cu13 was bumped on PyPI from 0.9.0 → 0.9.1 on 2026-04-29 18:36 UTC. On the affected runners, uv's upgrade install partially failed: it wrote the new 0.9.1 dist-info (registering the package as installed) but did not extract the bundled nvidia/cusparselt/lib/libcusparseLt.so.0. Because the dist-info looks intact, every subsequent uv pip install skips it as "already satisfied," so the broken state is sticky and import torch fails with libcusparseLt.so.0: cannot open shared object file.

Fix: at the end of install_sglang, if the wheel metadata is present but the .so is missing on disk, force-reinstall the wheel.

Failure example: https://github.com/sgl-project/sglang/actions/runs/25158915002/job/73748785757

Test plan

Trigger PR Test on a 1-GPU runner (5090 or h100); verify install dependency completes without libcusparseLt.so.0 ImportError
Confirm the WARNING fires on a runner where the file is missing, and is silent on a runner where it's present

install_sglang_kernel imports torch to detect cu version. Without LD_LIBRARY_PATH pointing at the pip-managed nvidia/*/lib paths, the import fails with libcusparseLt.so.0 not found on hosts where cusparselt is only available via the nvidia-cusparselt-cu13 wheel.

gemini-code-assist

Code Review

This pull request moves the setup_ld_library_path call to an earlier stage in the CI installation script to resolve a torch import issue. However, feedback indicates that removing the call from its original position might lead to an incomplete LD_LIBRARY_PATH for dependencies installed later in the process, such as NVIDIA packages. It is recommended to keep both calls to ensure all paths are correctly captured and exported to the environment.

gemini-code-assist · 2026-04-30T10:18:49Z

    fix_nvidia_deps
    install_test_tools
    prepare_runner
-    setup_ld_library_path


Moving setup_ld_library_path to run before install_sglang_kernel correctly fixes the import torch issue. However, removing the call from this location may introduce a new problem.

Functions that run after the new call location, such as install_extra_deps and fix_nvidia_deps, install additional NVIDIA packages (nvidia-cuda-nvrtc, nvidia-cudnn-cu*, etc.). If setup_ld_library_path is not run again after these packages are installed, their library paths will be missing from LD_LIBRARY_PATH. This will result in an incomplete library path being exported to GITHUB_ENV, potentially causing failures in verify_imports or subsequent CI steps.

To ensure LD_LIBRARY_PATH is always complete, please keep this call to setup_ld_library_path. The function is safe to call multiple times; it will simply prepend the full, updated set of library paths.

…arselt-cu13 Reordering setup_ld_library_path didn't fix the underlying issue — on the failing runners, nvidia-cusparselt-cu13 0.8.0 is registered in pip metadata but libcusparseLt.so.0 is physically missing from $site-packages/nvidia/cusparselt/lib/. No LD_LIBRARY_PATH adjustment finds a file that's not on disk. Two changes: 1. Read torch's CUDA tag from the local-version label (e.g. 2.9.1+cu130 → cu130) via pip show, instead of `import torch` which dlopens libcusparseLt and fails when the file is missing. 2. After install_sglang, if libcusparseLt.so.0 is missing but the wheel metadata claims it exists, force-reinstall nvidia-cusparselt-cu13. This restores the file before any later torch import. Reverts the main() reorder from the first attempt — that wasn't the bug.

…lt heal

…ct#24141)

alisonshao added run-ci high priority labels Apr 30, 2026

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

alisonshao added 2 commits April 30, 2026 03:20

ci: drop unnecessary version-detection change, keep only the cusparse…

a32ed6e

…lt heal

Kangyan-Zhou merged commit dc395bc into main Apr 30, 2026
100 of 101 checks passed

Kangyan-Zhou deleted the alison/fix-libcusparselt-import-ordering branch April 30, 2026 17:55

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

ci: run setup_ld_library_path before install_sglang_kernel (sgl-proje…

22dc312

…ct#24141)

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

ci: run setup_ld_library_path before install_sglang_kernel (sgl-proje…

c0491bb

…ct#24141)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: run setup_ld_library_path before install_sglang_kernel#24141

ci: run setup_ld_library_path before install_sglang_kernel#24141
Kangyan-Zhou merged 3 commits into
mainfrom
alison/fix-libcusparselt-import-ordering

alisonshao commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alisonshao commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alisonshao commented Apr 30, 2026 •

edited

Loading