[sgl-kernel] fix runtime error while preloading CUDA runtime #13089

anvdn · 2025-11-11T18:19:24Z

Motivation

~1 month ago, I opened the following issue. Essentially, this older change introduced (I am assuming inadvertently) a regression by adding a runtime error if we can't preload CUDA runtime from system dirs. Unfortunately, this binary isn't always present there, in particular if the CUDA toolkit isn't installed on the machine (e.g. we aren't using a heavy devel NVIDIA docker image). In such cases, we would error out at runtime with the following log, even though this step isn't strictly required for inference.

from sgl_kernel import gelu_and_mul, silu_and_mul
  File "/tmp/pex/venvs/1/dc55218965a020edeb15eb54f013bb950c7766d2/779eb2cc0ca9e2fdd204774cbc41848e4e7c5055/lib/python3.12/site-packages/sgl_kernel/__init__.py", line 217, in <module>
    raise RuntimeError("Could not find CUDA lib directory.")
RuntimeError: Could not find CUDA lib directory.

Modifications

Relaxes the logic to only attempt to preload the binary, as it was implemented prior to the change referenced above
Extend the logic to look under /usr/lib* which are other common locations on Debian/Ubuntu (see fix: detect CUDA libraries in /usr/lib/ #11477)

Note it's possible we may drop the preloading logic altogether and offload that completely to torch at init (e.g. here). For context, the logic was initially added to prevent a missing symbol error on GH200/GB200 on older CUDA versions (see #5746). Unfortunately I am unable to verify if we can drop it given I do not have access to such instances.

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-11-11T18:19:40Z

Summary of Changes

Hello @anvdn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a regression in sgl-kernel that caused a runtime error when the CUDA runtime library (libcudart.so.12) could not be preloaded from typical CUDA installation directories. This issue primarily affected environments where the full CUDA toolkit was not installed. The changes modify the preloading mechanism to be more resilient by not failing if the library isn't found and by extending the search to common system library paths, ensuring the application can still run without error in such configurations.

Highlights

Error Handling: The logic for preloading the CUDA runtime library has been relaxed to prevent a RuntimeError when the library cannot be found in standard CUDA installation paths, particularly on systems without a full CUDA toolkit. Instead of failing, the system will now log a debug message.
Library Search Paths: Expanded the search locations for the libcudart.so.12 library to include common system directories like /usr/lib/x86_64-linux-gnu, /usr/lib/aarch64-linux-gnu, /usr/lib64, and /usr/lib, improving compatibility across various Linux distributions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves a runtime error caused by a mandatory CUDA library preloading step. By relaxing this requirement to a non-blocking attempt and expanding the library search paths, the change significantly improves robustness, especially in containerized environments or systems without a standard CUDA toolkit installation. My review includes a critical fix for a potential UnboundLocalError in the new exception handling logic and a suggestion to reintroduce a recursive search as a fallback to ensure comprehensive library discovery without sacrificing the performance gains of the new approach.

sgl-kernel/python/sgl_kernel/load_utils.py

gemini-code-assist · 2025-11-11T18:20:52Z

sgl-kernel/python/sgl_kernel/load_utils.py

+                logger.debug(f"Failed to load {cuda_runtime_lib}: {e}")
+                continue
+
+    logger.debug("[sgl_kernel] Could not preload CUDA runtime library")


The previous implementation used rglob to perform a recursive search for the CUDA library within cuda_home. While the new list of explicit paths is a great improvement for performance and common cases, removing the recursive search entirely might be a regression for setups where the library is in a non-standard subdirectory of CUDA_HOME. To improve robustness, consider adding the rglob search back as a final fallback mechanism after the main loop.

Suggested change

logger.debug("[sgl_kernel] Could not preload CUDA runtime library")

# Fallback to a recursive search in cuda_home as a last resort

if cuda_home.is_dir():

for candidate in cuda_home.rglob("libcudart.so.12"):

if not candidate.is_file():

continue

try:

cuda_runtime_lib = candidate.resolve()

ctypes.CDLL(str(cuda_runtime_lib), mode=ctypes.RTLD_GLOBAL)

logger.debug(f"Preloaded CUDA runtime under {cuda_runtime_lib} (found via rglob)")

return

except Exception as e:

logger.debug(f"Failed to load {candidate} (found via rglob): {e}")

continue

logger.debug("[sgl_kernel] Could not preload CUDA runtime library")

Inokinoki · 2025-11-12T14:47:08Z

sgl-kernel/python/sgl_kernel/load_utils.py

+        Path("/usr/lib/x86_64-linux-gnu"),
+        Path("/usr/lib/aarch64-linux-gnu"),
+        Path("/usr/lib64"),
+        Path("/usr/lib"),


I think one promising way, to detect the lib in the env, would be detecting whether there are nvidia-cuda-runtime-like packages (such packages can be found on pytorch pypi, and NVIDIA's nvidia-cuda-runtime-cu11, nvidia-cuda-runtime-cu12, nvidia-cuda-runtime) installed in the current env.

$ pip list ... nvidia-cuda-runtime-cu12 (12.0.107) ...

If so, its paths could be also candidate dirs:

>>> import nvidia.cuda_runtime.lib >>> nvidia.cuda_runtime.lib.__path__ ['/home/ubuntu/dev/test-cudart/lib/python3.6/site-packages/nvidia/cuda_runtime/lib']

Agreed on that. However, given we currently have no reason to believe we need to keep this pre-loading logic, maybe that's not worth the extra complexity right now? In particular, I don't see any such logic in vLLM repo. Wdyt?

FlamingoPg · 2025-11-13T07:14:17Z

Thanks a lot! Is this PR ready for review?

FlamingoPg · 2025-11-13T07:15:08Z

When you need a review, feel free to ping me anytime.

anvdn · 2025-11-13T15:25:29Z

@FlamingoPg Yes can you please take a look? Thank you in advance :)

look under /usr/lib* and drop runtime error

d7c23c2

anvdn requested review from BBuf, FlamingoPg, HaiShaw, ispobock, merrymercy, yizhang2077 and zhyncs as code owners November 11, 2025 18:19

github-actions bot added the sgl-kernel label Nov 11, 2025

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

log relevant variable

4496ba1

anvdn mentioned this pull request Nov 11, 2025

[Bug] Importing sgl_kernel now hard-requires CUDA toolkit (regression vs v0.5.1) #11333

Open

5 tasks

Inokinoki reviewed Nov 12, 2025

View reviewed changes

FlamingoPg self-assigned this Nov 13, 2025

FlamingoPg added the run-ci label Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sgl-kernel] fix runtime error while preloading CUDA runtime #13089

[sgl-kernel] fix runtime error while preloading CUDA runtime #13089

anvdn commented Nov 11, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Nov 11, 2025

Uh oh!

Inokinoki Nov 12, 2025 •

edited

Loading

Uh oh!

anvdn Nov 12, 2025

Uh oh!

FlamingoPg commented Nov 13, 2025

Uh oh!

FlamingoPg commented Nov 13, 2025

Uh oh!

anvdn commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    logger.debug("[sgl_kernel] Could not preload CUDA runtime library")
+    # Fallback to a recursive search in cuda_home as a last resort
+    if cuda_home.is_dir():
+        for candidate in cuda_home.rglob("libcudart.so.12"):
+            if not candidate.is_file():
+                continue
+            try:
+                cuda_runtime_lib = candidate.resolve()
+                ctypes.CDLL(str(cuda_runtime_lib), mode=ctypes.RTLD_GLOBAL)
+                logger.debug(f"Preloaded CUDA runtime under {cuda_runtime_lib} (found via rglob)")
+                return
+            except Exception as e:
+                logger.debug(f"Failed to load {candidate} (found via rglob): {e}")
+                continue
+    logger.debug("[sgl_kernel] Could not preload CUDA runtime library")

[sgl-kernel] fix runtime error while preloading CUDA runtime #13089

Are you sure you want to change the base?

[sgl-kernel] fix runtime error while preloading CUDA runtime #13089

Conversation

anvdn commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 11, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Inokinoki Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anvdn Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

FlamingoPg commented Nov 13, 2025

Uh oh!

FlamingoPg commented Nov 13, 2025

Uh oh!

anvdn commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anvdn commented Nov 11, 2025 •

edited

Loading

Inokinoki Nov 12, 2025 •

edited

Loading