Skip to content

Conversation

@anvdn
Copy link
Contributor

@anvdn anvdn commented Nov 11, 2025

Motivation

~1 month ago, I opened the following issue. Essentially, this older change introduced (I am assuming inadvertently) a regression by adding a runtime error if we can't preload CUDA runtime from system dirs. Unfortunately, this binary isn't always present there, in particular if the CUDA toolkit isn't installed on the machine (e.g. we aren't using a heavy devel NVIDIA docker image). In such cases, we would error out at runtime with the following log, even though this step isn't strictly required for inference.

from sgl_kernel import gelu_and_mul, silu_and_mul
  File "/tmp/pex/venvs/1/dc55218965a020edeb15eb54f013bb950c7766d2/779eb2cc0ca9e2fdd204774cbc41848e4e7c5055/lib/python3.12/site-packages/sgl_kernel/__init__.py", line 217, in <module>
    raise RuntimeError("Could not find CUDA lib directory.")
RuntimeError: Could not find CUDA lib directory.

Modifications

  • Relaxes the logic to only attempt to preload the binary, as it was implemented prior to the change referenced above
  • Extend the logic to look under /usr/lib* which are other common locations on Debian/Ubuntu (see fix: detect CUDA libraries in /usr/lib/ #11477)

Note it's possible we may drop the preloading logic altogether and offload that completely to torch at init (e.g. here). For context, the logic was initially added to prevent a missing symbol error on GH200/GB200 on older CUDA versions (see #5746). Unfortunately I am unable to verify if we can drop it given I do not have access to such instances.

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @anvdn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a regression in sgl-kernel that caused a runtime error when the CUDA runtime library (libcudart.so.12) could not be preloaded from typical CUDA installation directories. This issue primarily affected environments where the full CUDA toolkit was not installed. The changes modify the preloading mechanism to be more resilient by not failing if the library isn't found and by extending the search to common system library paths, ensuring the application can still run without error in such configurations.

Highlights

  • Error Handling: The logic for preloading the CUDA runtime library has been relaxed to prevent a RuntimeError when the library cannot be found in standard CUDA installation paths, particularly on systems without a full CUDA toolkit. Instead of failing, the system will now log a debug message.
  • Library Search Paths: Expanded the search locations for the libcudart.so.12 library to include common system directories like /usr/lib/x86_64-linux-gnu, /usr/lib/aarch64-linux-gnu, /usr/lib64, and /usr/lib, improving compatibility across various Linux distributions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a runtime error caused by a mandatory CUDA library preloading step. By relaxing this requirement to a non-blocking attempt and expanding the library search paths, the change significantly improves robustness, especially in containerized environments or systems without a standard CUDA toolkit installation. My review includes a critical fix for a potential UnboundLocalError in the new exception handling logic and a suggestion to reintroduce a recursive search as a fallback to ensure comprehensive library discovery without sacrificing the performance gains of the new approach.

logger.debug(f"Failed to load {cuda_runtime_lib}: {e}")
continue

logger.debug("[sgl_kernel] Could not preload CUDA runtime library")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The previous implementation used rglob to perform a recursive search for the CUDA library within cuda_home. While the new list of explicit paths is a great improvement for performance and common cases, removing the recursive search entirely might be a regression for setups where the library is in a non-standard subdirectory of CUDA_HOME. To improve robustness, consider adding the rglob search back as a final fallback mechanism after the main loop.

Suggested change
logger.debug("[sgl_kernel] Could not preload CUDA runtime library")
# Fallback to a recursive search in cuda_home as a last resort
if cuda_home.is_dir():
for candidate in cuda_home.rglob("libcudart.so.12"):
if not candidate.is_file():
continue
try:
cuda_runtime_lib = candidate.resolve()
ctypes.CDLL(str(cuda_runtime_lib), mode=ctypes.RTLD_GLOBAL)
logger.debug(f"Preloaded CUDA runtime under {cuda_runtime_lib} (found via rglob)")
return
except Exception as e:
logger.debug(f"Failed to load {candidate} (found via rglob): {e}")
continue
logger.debug("[sgl_kernel] Could not preload CUDA runtime library")

Path("/usr/lib/x86_64-linux-gnu"),
Path("/usr/lib/aarch64-linux-gnu"),
Path("/usr/lib64"),
Path("/usr/lib"),
Copy link

@Inokinoki Inokinoki Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one promising way, to detect the lib in the env, would be detecting whether there are nvidia-cuda-runtime-like packages (such packages can be found on pytorch pypi, and NVIDIA's nvidia-cuda-runtime-cu11, nvidia-cuda-runtime-cu12, nvidia-cuda-runtime) installed in the current env.

$ pip list
...
nvidia-cuda-runtime-cu12 (12.0.107)
...

If so, its paths could be also candidate dirs:

>>> import nvidia.cuda_runtime.lib
>>> nvidia.cuda_runtime.lib.__path__
['/home/ubuntu/dev/test-cudart/lib/python3.6/site-packages/nvidia/cuda_runtime/lib']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on that. However, given we currently have no reason to believe we need to keep this pre-loading logic, maybe that's not worth the extra complexity right now? In particular, I don't see any such logic in vLLM repo. Wdyt?

@FlamingoPg FlamingoPg self-assigned this Nov 13, 2025
@FlamingoPg
Copy link
Collaborator

Thanks a lot! Is this PR ready for review?

@FlamingoPg
Copy link
Collaborator

When you need a review, feel free to ping me anytime.

@anvdn
Copy link
Contributor Author

anvdn commented Nov 13, 2025

@FlamingoPg Yes can you please take a look? Thank you in advance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants