-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix: detect CUDA libraries in /usr/lib/ #11477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @hayko406, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This change addresses a critical initialization issue for sglang on specific Linux distributions where CUDA runtime libraries are installed in system-wide locations rather than within a typical CUDA Toolkit installation. By expanding the search scope for Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request extends the search for CUDA libraries to include common system paths, which is a good improvement for compatibility on systems like Debian/Ubuntu. The implementation is functionally correct, but I've suggested a refactoring of the loop structure to improve readability and maintainability.
| for base in candidates: | ||
| for path in base.rglob("libcudart.so.12"): | ||
| cuda_path = path.parent | ||
| break | ||
| else: | ||
| continue | ||
| break | ||
| else: | ||
| raise RuntimeError("Could not find CUDA lib directory.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested loop structure with for-else-continue-break is functionally correct but can be difficult to read and understand at a glance. For improved readability and maintainability, consider refactoring this to a single loop that iterates through candidate paths and uses next() on the generator returned by rglob to find the first match. This simplifies the logic significantly.
| for base in candidates: | |
| for path in base.rglob("libcudart.so.12"): | |
| cuda_path = path.parent | |
| break | |
| else: | |
| continue | |
| break | |
| else: | |
| raise RuntimeError("Could not find CUDA lib directory.") | |
| for base in candidates: | |
| try: | |
| path = next(base.rglob("libcudart.so.12")) | |
| cuda_path = path.parent | |
| break | |
| except StopIteration: | |
| continue | |
| else: | |
| raise RuntimeError("Could not find CUDA lib directory.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
sgl-kernel/python/sgl_kernel/__init__.py (2)
212-227: Prefer direct path checks over recursive search for better performance and portability.The recursive
rglobsearch can be slow on large system directories. Additionally, the hardcoded/usr/lib/x86_64-linux-gnupath is specific to x86_64 Debian/Ubuntu systems and won't work on ARM systems which use/usr/lib/aarch64-linux-gnu.Consider using direct path existence checks instead:
- # Search for libcudart.so.12 in common system locations - candidates = [ - cuda_home, - Path("/usr/lib/x86_64-linux-gnu"), - Path("/usr/lib64"), - Path("/usr/lib") - ] - for base in candidates: - for path in base.rglob("libcudart.so.12"): - cuda_path = path.parent - break - else: - continue - break - else: - raise RuntimeError("Could not find CUDA lib directory.") + # Search for libcudart.so.12 in common system locations + candidates = [ + cuda_home / "lib" / "libcudart.so.12", + cuda_home / "lib64" / "libcudart.so.12", + Path("/usr/lib/x86_64-linux-gnu/libcudart.so.12"), + Path("/usr/lib/aarch64-linux-gnu/libcudart.so.12"), + Path("/usr/lib64/libcudart.so.12"), + Path("/usr/lib/libcudart.so.12"), + ] + for candidate in candidates: + if candidate.exists(): + cuda_path = candidate.parent + break + else: + raise RuntimeError( + f"Could not find libcudart.so.12 in any of: {[str(c) for c in candidates]}" + )This approach:
- Avoids slow recursive directory traversal
- Supports both x86_64 and ARM architectures
- Provides a more informative error message listing all checked paths
229-229: Misleading variable name.The variable
cuda_includesuggests an include directory but actually points to a library file (libcudart.so.12). Consider renaming tocuda_liborcuda_libraryfor clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
sgl-kernel/python/sgl_kernel/__init__.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint
🔇 Additional comments (1)
sgl-kernel/python/sgl_kernel/__init__.py (1)
212-227: Verify the fix resolves the runtime error on target systems.Please confirm that this change successfully loads CUDA libraries on Debian/Ubuntu systems where
libcudart.so.12is installed in/usr/lib/x86_64-linux-gnu.You can verify this by:
- Testing on a Debian/Ubuntu system with CUDA installed via system packages
- Ensuring
CUDA_HOMEis not set in the environment- Running code that imports
sgl_kerneland confirming no RuntimeError is raised- Checking that
libcudart.so.12is successfully loaded via ctypes
|
Answering here as @hayko406 tagged me on the issue I opened. Thank you for looking into that! Would maybe just want to confirm it is ok to drop the requirement on finding / loading CUDA runtime: |
zhyncs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember introducing this function to fix the cu126 issue (missing symbol). I'm not sure if we still need it, may you help verify? Thanks. cc @FlamingoPg
It looks for #8813 sbsa-linux platform |
@FlamingoPg #8813 is a fix for gh200, I'm not sure if we need the entire function. If we delete it, what will happen? |
I think we can merge that right now, and simplify later |
|
@hayko406 Sorry for the late reply. Could you help fix the conflicts? |
Motivation
Fixes a runtime error when loading CUDA libraries on systems where libcudart.so.12 is installed in system library paths (e.g., /usr/lib/x86_64-linux-gnu), which is common on Debian/Ubuntu.
Without this change, sglang fails to locate CUDA and cannot initialize properly on such setups.
Modifications
Extended the CUDA library search logic to include common system library paths (/usr/lib/x86_64-linux-gnu, /usr/lib64, /usr/lib).
Accuracy Tests
Benchmarking and Profiling
Checklist
Summary by CodeRabbit