Skip to content

Conversation

@hayko406
Copy link

@hayko406 hayko406 commented Oct 11, 2025

Motivation

Fixes a runtime error when loading CUDA libraries on systems where libcudart.so.12 is installed in system library paths (e.g., /usr/lib/x86_64-linux-gnu), which is common on Debian/Ubuntu.
Without this change, sglang fails to locate CUDA and cannot initialize properly on such setups.

Modifications

Extended the CUDA library search logic to include common system library paths (/usr/lib/x86_64-linux-gnu, /usr/lib64, /usr/lib).

Accuracy Tests

Benchmarking and Profiling

Checklist

Summary by CodeRabbit

  • Bug Fixes
    • Improved CUDA runtime detection by searching common system library paths, reducing startup failures on diverse Linux setups.
    • Clearer error messaging when the CUDA runtime cannot be located.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hayko406, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This change addresses a critical initialization issue for sglang on specific Linux distributions where CUDA runtime libraries are installed in system-wide locations rather than within a typical CUDA Toolkit installation. By expanding the search scope for libcudart.so.12 to include these common system paths, the pull request ensures that sglang can correctly detect and utilize CUDA, preventing runtime errors and improving out-of-the-box compatibility for users on affected setups.

Highlights

  • CUDA Library Detection Fix: This pull request resolves a runtime error where sglang failed to locate CUDA libraries on systems like Debian/Ubuntu, particularly when libcudart.so.12 is installed in standard system library paths.
  • Expanded Search Paths: The CUDA library search logic has been extended to include common system library directories such as /usr/lib/x86_64-linux-gnu, /usr/lib64, and /usr/lib, ensuring broader compatibility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 11, 2025

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely summarizes the primary change by indicating it fixes library detection in the system path, directly reflecting the extended CUDA search behavior in the PR.
Description Check ✅ Passed The pull request description follows the repository’s template by including Motivation, Modifications, Accuracy Tests, Benchmarking and Profiling, and Checklist sections, with clear explanations of the problem and the code changes.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends the search for CUDA libraries to include common system paths, which is a good improvement for compatibility on systems like Debian/Ubuntu. The implementation is functionally correct, but I've suggested a refactoring of the loop structure to improve readability and maintainability.

Comment on lines 219 to 227
for base in candidates:
for path in base.rglob("libcudart.so.12"):
cuda_path = path.parent
break
else:
continue
break
else:
raise RuntimeError("Could not find CUDA lib directory.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nested loop structure with for-else-continue-break is functionally correct but can be difficult to read and understand at a glance. For improved readability and maintainability, consider refactoring this to a single loop that iterates through candidate paths and uses next() on the generator returned by rglob to find the first match. This simplifies the logic significantly.

Suggested change
for base in candidates:
for path in base.rglob("libcudart.so.12"):
cuda_path = path.parent
break
else:
continue
break
else:
raise RuntimeError("Could not find CUDA lib directory.")
for base in candidates:
try:
path = next(base.rglob("libcudart.so.12"))
cuda_path = path.parent
break
except StopIteration:
continue
else:
raise RuntimeError("Could not find CUDA lib directory.")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
sgl-kernel/python/sgl_kernel/__init__.py (2)

212-227: Prefer direct path checks over recursive search for better performance and portability.

The recursive rglob search can be slow on large system directories. Additionally, the hardcoded /usr/lib/x86_64-linux-gnu path is specific to x86_64 Debian/Ubuntu systems and won't work on ARM systems which use /usr/lib/aarch64-linux-gnu.

Consider using direct path existence checks instead:

-        # Search for libcudart.so.12 in common system locations
-        candidates = [
-            cuda_home,
-            Path("/usr/lib/x86_64-linux-gnu"),
-            Path("/usr/lib64"),
-            Path("/usr/lib")
-        ]
-        for base in candidates:
-            for path in base.rglob("libcudart.so.12"):
-                cuda_path = path.parent
-                break
-            else:
-                continue
-            break
-        else:
-            raise RuntimeError("Could not find CUDA lib directory.")
+        # Search for libcudart.so.12 in common system locations
+        candidates = [
+            cuda_home / "lib" / "libcudart.so.12",
+            cuda_home / "lib64" / "libcudart.so.12",
+            Path("/usr/lib/x86_64-linux-gnu/libcudart.so.12"),
+            Path("/usr/lib/aarch64-linux-gnu/libcudart.so.12"),
+            Path("/usr/lib64/libcudart.so.12"),
+            Path("/usr/lib/libcudart.so.12"),
+        ]
+        for candidate in candidates:
+            if candidate.exists():
+                cuda_path = candidate.parent
+                break
+        else:
+            raise RuntimeError(
+                f"Could not find libcudart.so.12 in any of: {[str(c) for c in candidates]}"
+            )

This approach:

  • Avoids slow recursive directory traversal
  • Supports both x86_64 and ARM architectures
  • Provides a more informative error message listing all checked paths

229-229: Misleading variable name.

The variable cuda_include suggests an include directory but actually points to a library file (libcudart.so.12). Consider renaming to cuda_lib or cuda_library for clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a30487 and b0130c1.

📒 Files selected for processing (1)
  • sgl-kernel/python/sgl_kernel/__init__.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint
🔇 Additional comments (1)
sgl-kernel/python/sgl_kernel/__init__.py (1)

212-227: Verify the fix resolves the runtime error on target systems.

Please confirm that this change successfully loads CUDA libraries on Debian/Ubuntu systems where libcudart.so.12 is installed in /usr/lib/x86_64-linux-gnu.

You can verify this by:

  1. Testing on a Debian/Ubuntu system with CUDA installed via system packages
  2. Ensuring CUDA_HOME is not set in the environment
  3. Running code that imports sgl_kernel and confirming no RuntimeError is raised
  4. Checking that libcudart.so.12 is successfully loaded via ctypes

@anvdn
Copy link
Contributor

anvdn commented Oct 12, 2025

Answering here as @hayko406 tagged me on the issue I opened. Thank you for looking into that!
I can confirm this solves the runtime issue I was facing when importing sgl_kernel (#11333) on my system with no CUDA toolkit installed (e.g. no /usr/local/cuda/, no nvcc, etc.). The new logic is able to locate the NVIDIA driver and according to your implementation, it is sufficient to circumvent the raise:

>>> cuda_path
PosixPath('/usr/lib/x86_64-linux-gnu')
>>> cuda_lib
PosixPath('/usr/lib/x86_64-linux-gnu/libcuda.so.570.172.08')

Would maybe just want to confirm it is ok to drop the requirement on finding / loading CUDA runtime: "libcudart.so.12". In my case, I don't think that object is located anywhere in the system dirs. It's probably bundled in the torch==2.8.0+cu128 wheel and gets automatically loaded when importing torch. I never faced any issue while running inference prior to this requirement being added in #8813

@FlamingoPg FlamingoPg self-assigned this Oct 13, 2025
Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember introducing this function to fix the cu126 issue (missing symbol). I'm not sure if we still need it, may you help verify? Thanks. cc @FlamingoPg

@FlamingoPg
Copy link
Collaborator

I remember introducing this function to fix the cu126 issue (missing symbol). I'm not sure if we still need it, may you help verify? Thanks. cc @FlamingoPg

It looks for #8813 sbsa-linux platform

@zhyncs
Copy link
Member

zhyncs commented Oct 13, 2025

I remember introducing this function to fix the cu126 issue (missing symbol). I'm not sure if we still need it, may you help verify? Thanks. cc @FlamingoPg

It looks for #8813 sbsa-linux platform

@FlamingoPg #8813 is a fix for gh200, I'm not sure if we need the entire function. If we delete it, what will happen?

@FlamingoPg
Copy link
Collaborator

I remember introducing this function to fix the cu126 issue (missing symbol). I'm not sure if we still need it, may you help verify? Thanks. cc @FlamingoPg

It looks for #8813 sbsa-linux platform

@FlamingoPg #8813 is a fix for gh200, I'm not sure if we need the entire function. If we delete it, what will happen?

I think we can merge that right now, and simplify later

@FlamingoPg
Copy link
Collaborator

@hayko406 Sorry for the late reply. Could you help fix the conflicts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants