Skip to content

[Improvement] Persist CUDA compat libraries paths to prevent reset on apt-get#30784

Merged
vllm-bot merged 2 commits intovllm-project:mainfrom
emricksini-h:fix/persist-cuda-compat-config
Jan 13, 2026
Merged

[Improvement] Persist CUDA compat libraries paths to prevent reset on apt-get#30784
vllm-bot merged 2 commits intovllm-project:mainfrom
emricksini-h:fix/persist-cuda-compat-config

Conversation

@emricksini-h
Copy link
Copy Markdown
Contributor

@emricksini-h emricksini-h commented Dec 16, 2025

Currently, the Dockerfile registers CUDA compatibility libraries using a transient RUN ldconfig /path/to/compat command. This updates the cache but does not persist the configuration.

If a user extends this image or runs a debug container and executes apt-get install (which triggers a default ldconfig), the custom compatibility path is wiped from the cache. This causes the container to silently fall back to the host driver's native CUDA version (e.g., 12.4) instead of the container's optimized version (12.9), potentially degrading performance or raising version compatibility mismatch errors.

This PR make this more robust by writing the path to /etc/ld.so.conf.d/00-cuda-compat.conf before running ldconfig. This ensures the compatibility layer persists regardless of future package installations or cache rebuilds.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly persists the CUDA compatibility library path by creating a configuration file in /etc/ld.so.conf.d/, which is a good improvement for the Docker image's robustness. However, the implementation introduces a critical command injection vulnerability by using an unquoted build argument within a command substitution. I have provided specific comments and suggestions to address this. This pattern of unquoted variables appears elsewhere in the Dockerfile, and I strongly recommend a full audit to fix all instances and secure the build process.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@emricksini-h emricksini-h force-pushed the fix/persist-cuda-compat-config branch from 47bf56c to 83b9527 Compare December 16, 2025 17:30
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Dec 16, 2025
@emricksini-h emricksini-h force-pushed the fix/persist-cuda-compat-config branch from 83b9527 to 18ff788 Compare December 16, 2025 17:33
Copy link
Copy Markdown
Collaborator

@wangshangsam wangshangsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, but, in the PR description, could you include a concrete example of where the existing code fails? Cuz

If a user extends this image or runs a debug container and executes apt-get install (which triggers a default ldconfig), the custom compatibility path is wiped from the cache.

This I can understand. But

This causes the container to silently fall back to the host driver's native CUDA version (e.g., 12.4) instead of the container's optimized version (12.9), potentially degrading performance or raising version compatibility mismatch errors.

This I don't quite understand. I thought that, the container's CUDA version is just the container's CUDA version, and the whole /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/ thing is just to enable compatibility mode (i.e., you can run a later CUDA version on a older driver version)?

Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this makes sense to me as a fragile configuration right now. But I agree with @wangshangsam to clarify

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Jan 12, 2026
@emricksini-h
Copy link
Copy Markdown
Contributor Author

Thanks @wangshangsam & @mgoin for the review !

To give a concrete example, I ran two versions of the vLLM Docker image on a cluster node with CUDA 12.4 (driver 550.163.01) installed. The first image is the base vllm-openai. The second uses the same base but executes a setup script via a CMD argument to install debug utilities with apt-get.

In the first docker image (default), I have:

<<K9s-Shell>> Pod: inference/test-6fb4b645d8-f4bg5 | Container: test
root@test-6fb4b645d8-f4bg5:/vllm-workspace# nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|

In the second (debug), I have:

<<K9s-Shell>> Pod: inference/dev-66dc7d79c9-x5v6h | Container: dev
root@dev-66dc7d79c9-x5v6h:/vllm-workspace# nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|

Ultimately, there is only one driver version, but that driver may be compatible with different CUDA versions (including newer ones). Incompatibility issues can arise when code compiled with CUDA 12.9 (or newer) is executed in a Docker container that lacks the necessary compatibility layer, causing it to fall back to the node's version (12.4).

In my case, I encountered the following error when loading Qwen_Qwen3-VL-4B-Instruct-FP8 in the debug image:

(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843] EngineCore failed to start.
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843] Traceback (most recent call last):
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in run_engine_core
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 610, in __init__
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     super().__init__(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 109, in __init__
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 235, in _initialize_kv_caches
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 479, in run_method
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 324, in determine_available_memory
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     self.model_runner.profile_run()
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4322, in profile_run
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     dummy_encoder_outputs = self.model.embed_multimodal(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 1512, in embed_multimodal
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     video_embeddings = self._process_video_input(multimodal_input)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 1413, in _process_video_input
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     video_embeds = self.visual(pixel_values_videos, grid_thw=grid_thw)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 560, in forward
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     hidden_states = blk(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                     ^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 237, in forward
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     x = x + self.attn(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]             ^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_5_vl.py", line 398, in forward
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     context_layer = vit_flash_attn_wrapper(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                     ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/ops/vit_attn_wrappers.py", line 82, in vit_flash_attn_wrapper
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return torch.ops.vllm.flash_attn_maxseqlen_wrapper(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return self._op(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/ops/vit_attn_wrappers.py", line 36, in flash_attn_maxseqlen_wrapper
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     output = flash_attn_varlen_func(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]              ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 253, in flash_attn_varlen_func
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]     return self._op(*args, **kwargs)
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843] Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=1159) ERROR 01-13 03:55:38 [core.py:843] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

By running the following command, the issue got resolved:

ldconfig /usr/local/cuda-12.9/compat/

The fix in the PR prevents the issue to appear by making sure the compat is always enabled.

Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
@emricksini-h emricksini-h force-pushed the fix/persist-cuda-compat-config branch from 18ff788 to e64ddb2 Compare January 13, 2026 12:08
@wangshangsam
Copy link
Copy Markdown
Collaborator

Thanks @emricksini-h ! Now this makes sense.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 13, 2026
@mgoin mgoin enabled auto-merge (squash) January 13, 2026 20:22
@vllm-bot vllm-bot merged commit 2a60ac9 into vllm-project:main Jan 13, 2026
94 of 97 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 13, 2026
@huydhn
Copy link
Copy Markdown
Contributor

huydhn commented Jan 15, 2026

Unfortunately, I think this change doesn't work with newer drivers. PyTorch x vLLM benchmark jobs are using 580.105.08 to support newer CUDA 13.0

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA B200                    Off |   00000000:D1:00.0 Off |                    0 |
| N/A   32C    P0            141W /  750W |       0MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

With this change, importing PyTorch fails right away:

python3 -c 'import torch; torch.cuda.is_available()'
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:182: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:119.)
  return torch._C._cuda_getDeviceCount() > 0

Here is an example failure https://github.com/pytorch/pytorch-integration-testing/actions/runs/21017403967/job/60426060877#step:19:1452

sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026
… `apt-get` (vllm-project#30784)

Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
emricksini-h added a commit to emricksini-h/vllm that referenced this pull request Jan 16, 2026
…reset on `apt-get` (vllm-project#30784)"

This reverts commit 2a60ac9.

Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
… `apt-get` (vllm-project#30784)

Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
… `apt-get` (vllm-project#30784)

Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
wangshangsam added a commit to CentML/vllm that referenced this pull request Jan 24, 2026
wangshangsam added a commit to CentML/vllm that referenced this pull request Jan 25, 2026
zhandaz pushed a commit to CentML/vllm that referenced this pull request Jan 25, 2026
wangshangsam added a commit to CentML/vllm that referenced this pull request Jan 25, 2026
* [Docker][Dev] Fix libnccl-dev version for the CUDA 13.0.1 devel image

[Docker][Dev] Fix libnccl-dev version conflict for the CUDA 13.0.1 devel image

Further update

* feat: Support FA4 for mm-encoder-attn-backend for qwen models

* feat: Kernel warmup for vit fa4

* fix: Fix some minor conflicts due to the introduction of flash_attn.cute

* Revert "[Docker][Dev] Fix libnccl-dev version for the CUDA 13.0.1 devel image"

This reverts commit ab76b28.

* chore: Update requirements and revert README.md

* chore: Install git for flash_attn cute installation

* lint: Fix linting

* Revert "[Improvement] Persist CUDA compat libraries paths to prevent reset on `apt-get` (vllm-project#30784)" (#31)

This reverts commit 2a60ac9.

---------

Co-authored-by: Shang Wang <shangw@nvidia.com>
zhandaz added a commit to CentML/vllm that referenced this pull request Feb 4, 2026
* [Docker][Dev] Fix libnccl-dev version for the CUDA 13.0.1 devel image

[Docker][Dev] Fix libnccl-dev version conflict for the CUDA 13.0.1 devel image

Further update

* feat: Support FA4 for mm-encoder-attn-backend for qwen models

* feat: Kernel warmup for vit fa4

* fix: Fix some minor conflicts due to the introduction of flash_attn.cute

* Revert "[Docker][Dev] Fix libnccl-dev version for the CUDA 13.0.1 devel image"

This reverts commit ab76b28.

* chore: Update requirements and revert README.md

* chore: Install git for flash_attn cute installation

* lint: Fix linting

* Revert "[Improvement] Persist CUDA compat libraries paths to prevent reset on `apt-get` (vllm-project#30784)" (#31)

This reverts commit 2a60ac9.

---------

Co-authored-by: Shang Wang <shangw@nvidia.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
… `apt-get` (vllm-project#30784)

Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build multi-modality Related to multi-modality (#4194) nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants