[Bugfix][CI] Fix `ImportError: libcudart.so.12: cannot open shared object file: No such file or directory` by NickLucche · Pull Request #44192 · vllm-project/vllm

NickLucche · 2026-06-01T10:34:41Z

Fix https://buildkite.com/vllm/ci/builds/69183/canvas?jid=019e8236-5969-4911-8b8b-095d157a85cf&tab=output.

Nixl>=1.1.0 installs both cu12/13.
nixl-cu13 1.2.0 ships a libcudart.so.13 linked nixl_ep_cpp.so, and a valid concern about it was raised in the past here #39923.

~~We're now back to being forced again to install only a single nixl-cu* to avoid nixl-cu12 nixl_ep_cpp import from taking over and look for libcudart12.~~
I think the issue is because we reinstall nixl manually before each test on CI.
Now trying to use a single docker image with INSTALL_KV_CONNECTORS on CI so that it matches what we ship in release.

PS this is a fix to unlock CI. I think we should have a carefully thought solution that is made to last and sort out these imports issues, especially now that nixl_ep is default for eplb cc @alec-flowers

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche · 2026-06-01T15:49:31Z

tests are now failing with an mp related issue

FAILED v1/logits_processors/test_correctness.py::test_logitsprocs[logitsprocs_under_test0-50-cuda:0] - RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

trying to swap install order

alec-flowers · 2026-06-01T16:27:30Z

Yes longer term fix is in NIXL packaging so we don't have to do this
ai-dynamo/nixl#1646 cc @ovidiusm

Signed-off-by: NickLucche <nlucches@redhat.com>

njhill · 2026-06-01T22:07:37Z

Perhaps this would help with the latest issue: #44252.

However it would probably be best to understand why the latest nixl is causing cuda to be initialized earlier than it was before.

Also noticed this in the CI logs, not sure if it's related or preexisting:

  --------------------------------------------------------------------------------
    CuPy may not function correctly because multiple CuPy packages are installed
    in your environment:
      cupy-cuda12x, cupy-cuda13x
    Follow these steps to resolve this issue:
      1. For all packages listed above, run the following command to remove all
         existing CuPy installations:
           $ pip uninstall <package_name>
        If you previously installed CuPy via conda, also run the following:
           $ conda uninstall cupy
      2. Install the appropriate CuPy package.
         Refer to the Installation Guide for detailed instructions.
           https://docs.cupy.dev/en/stable/install.html
  --------------------------------------------------------------------------------

njhill · 2026-06-01T22:12:06Z

Ah I see @alec-flowers just opened #44258 👍 ... still not obvious how this was exposed by these changes and wasn't an issue prior to them!

mergify · 2026-06-02T03:01:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

MatthewBonanni · 2026-06-02T14:45:55Z

Is this superseded by #44266 (merged)?

Harry-Chen · 2026-06-02T14:53:55Z

Is this superseded by #44266 (merged)?

Looks #44266 fixed the CI image for tests, but we may still have wrong dependencies in published release images (not confirmed).

NickLucche · 2026-06-03T08:46:48Z

@Harry-Chen @MatthewBonanni #44266 patched it for nixl tests for now.
I think in the long run we want to go this way and minimize the differences between test and release images.
Differences are big enough right now that adding connector deps causes all sorts of issues on test image, so that's not super reassuring..

I would keep this PR open as reference of what we want to lean toward, and close it once we manage to add it

init

c381854

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche requested review from Harry-Chen and khluu as code owners June 1, 2026 10:34

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 1, 2026

mergify Bot added ci/build nvidia bug Something isn't working labels Jun 1, 2026

github-project-automation Bot added this to NVIDIA Jun 1, 2026

NickLucche added 5 commits June 1, 2026 14:15

do not reinstall nixl

87a9abe

Signed-off-by: NickLucche <nlucches@redhat.com>

install connectors for CI runs in the image

7f45755

Signed-off-by: NickLucche <nlucches@redhat.com>

wrong target image

6c1cd69

Signed-off-by: NickLucche <nlucches@redhat.com>

install connector in test image dockerfile

25ba47d

Signed-off-by: NickLucche <nlucches@redhat.com>

swap order

a08d1df

Signed-off-by: NickLucche <nlucches@redhat.com>

lazy cuda loading

e442c13

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche force-pushed the fix-libcudart-nixlep branch from 664f60c to e442c13 Compare June 1, 2026 16:37

alec-flowers mentioned this pull request Jun 1, 2026

[Bugfix][CI] Avoid CUDA init during tests.utils import #44258

Closed

alec-flowers mentioned this pull request Jun 2, 2026

[Bugfix][CI] Normalize NIXL connector CUDA wheel installs #44266

Merged

mergify Bot added the needs-rebase label Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][CI] Fix `ImportError: libcudart.so.12: cannot open shared object file: No such file or directory`#44192

[Bugfix][CI] Fix `ImportError: libcudart.so.12: cannot open shared object file: No such file or directory`#44192
NickLucche wants to merge 7 commits into
vllm-project:mainfrom
NickLucche:fix-libcudart-nixlep

NickLucche commented Jun 1, 2026 •

edited

Loading

Uh oh!

NickLucche commented Jun 1, 2026

Uh oh!

alec-flowers commented Jun 1, 2026

Uh oh!

njhill commented Jun 1, 2026

Uh oh!

njhill commented Jun 1, 2026

Uh oh!

mergify Bot commented Jun 2, 2026

Uh oh!

MatthewBonanni commented Jun 2, 2026 •

edited

Loading

Uh oh!

Harry-Chen commented Jun 2, 2026

Uh oh!

NickLucche commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

NickLucche commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLucche commented Jun 1, 2026

Uh oh!

alec-flowers commented Jun 1, 2026

Uh oh!

njhill commented Jun 1, 2026

Uh oh!

njhill commented Jun 1, 2026

Uh oh!

mergify Bot commented Jun 2, 2026

Uh oh!

MatthewBonanni commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Harry-Chen commented Jun 2, 2026

Uh oh!

NickLucche commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NickLucche commented Jun 1, 2026 •

edited

Loading

MatthewBonanni commented Jun 2, 2026 •

edited

Loading