-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[sgl-kernel] fix runtime error while preloading CUDA runtime #13089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[sgl-kernel] fix runtime error while preloading CUDA runtime #13089
Conversation
Summary of ChangesHello @anvdn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a regression in Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively resolves a runtime error caused by a mandatory CUDA library preloading step. By relaxing this requirement to a non-blocking attempt and expanding the library search paths, the change significantly improves robustness, especially in containerized environments or systems without a standard CUDA toolkit installation. My review includes a critical fix for a potential UnboundLocalError in the new exception handling logic and a suggestion to reintroduce a recursive search as a fallback to ensure comprehensive library discovery without sacrificing the performance gains of the new approach.
| logger.debug(f"Failed to load {cuda_runtime_lib}: {e}") | ||
| continue | ||
|
|
||
| logger.debug("[sgl_kernel] Could not preload CUDA runtime library") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation used rglob to perform a recursive search for the CUDA library within cuda_home. While the new list of explicit paths is a great improvement for performance and common cases, removing the recursive search entirely might be a regression for setups where the library is in a non-standard subdirectory of CUDA_HOME. To improve robustness, consider adding the rglob search back as a final fallback mechanism after the main loop.
| logger.debug("[sgl_kernel] Could not preload CUDA runtime library") | |
| # Fallback to a recursive search in cuda_home as a last resort | |
| if cuda_home.is_dir(): | |
| for candidate in cuda_home.rglob("libcudart.so.12"): | |
| if not candidate.is_file(): | |
| continue | |
| try: | |
| cuda_runtime_lib = candidate.resolve() | |
| ctypes.CDLL(str(cuda_runtime_lib), mode=ctypes.RTLD_GLOBAL) | |
| logger.debug(f"Preloaded CUDA runtime under {cuda_runtime_lib} (found via rglob)") | |
| return | |
| except Exception as e: | |
| logger.debug(f"Failed to load {candidate} (found via rglob): {e}") | |
| continue | |
| logger.debug("[sgl_kernel] Could not preload CUDA runtime library") |
| Path("/usr/lib/x86_64-linux-gnu"), | ||
| Path("/usr/lib/aarch64-linux-gnu"), | ||
| Path("/usr/lib64"), | ||
| Path("/usr/lib"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one promising way, to detect the lib in the env, would be detecting whether there are nvidia-cuda-runtime-like packages (such packages can be found on pytorch pypi, and NVIDIA's nvidia-cuda-runtime-cu11, nvidia-cuda-runtime-cu12, nvidia-cuda-runtime) installed in the current env.
$ pip list
...
nvidia-cuda-runtime-cu12 (12.0.107)
...If so, its paths could be also candidate dirs:
>>> import nvidia.cuda_runtime.lib
>>> nvidia.cuda_runtime.lib.__path__
['/home/ubuntu/dev/test-cudart/lib/python3.6/site-packages/nvidia/cuda_runtime/lib']There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on that. However, given we currently have no reason to believe we need to keep this pre-loading logic, maybe that's not worth the extra complexity right now? In particular, I don't see any such logic in vLLM repo. Wdyt?
|
Thanks a lot! Is this PR ready for review? |
|
When you need a review, feel free to ping me anytime. |
|
@FlamingoPg Yes can you please take a look? Thank you in advance :) |
Motivation
~1 month ago, I opened the following issue. Essentially, this older change introduced (I am assuming inadvertently) a regression by adding a runtime error if we can't preload CUDA runtime from system dirs. Unfortunately, this binary isn't always present there, in particular if the CUDA toolkit isn't installed on the machine (e.g. we aren't using a heavy
develNVIDIA docker image). In such cases, we would error out at runtime with the following log, even though this step isn't strictly required for inference.Modifications
/usr/lib*which are other common locations on Debian/Ubuntu (see fix: detect CUDA libraries in /usr/lib/ #11477)Note it's possible we may drop the preloading logic altogether and offload that completely to
torchat init (e.g. here). For context, the logic was initially added to prevent a missing symbol error on GH200/GB200 on older CUDA versions (see #5746). Unfortunately I am unable to verify if we can drop it given I do not have access to such instances.Accuracy Tests
Benchmarking and Profiling
Checklist