[bugfix]fix file not found error in nightly of single-node#6976
[bugfix]fix file not found error in nightly of single-node#6976wangxiyuan merged 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: MrZ20 <2609716663@qq.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a runtime 'file not found' error by consistently adding '/usr/local/lib' to the LD_LIBRARY_PATH environment variable across various Docker build configurations and within the end-to-end testing setup. This ensures that necessary shared libraries are discoverable, particularly in single-node nightly environments, improving the stability and reliability of the builds. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request aims to fix a "file not found" error by adding /usr/local/lib to LD_LIBRARY_PATH. While the change in tests/e2e/conftest.py is appropriate for the test environment, the approach in the Dockerfiles can be improved for robustness. Modifying ~/.bashrc only affects interactive shells and is not a reliable way to set environment variables for all processes. I've suggested using ldconfig to configure the dynamic linker, which is the standard practice and ensures the library path is available system-wide. The provided repository style guide concerns pull request summaries and titles, and does not contain coding standards, so it was not referenced in the code review comments.
| python3 -m pip cache purge | ||
|
|
||
| RUN echo "export LD_PRELOAD=/usr/lib/$(uname -m)-linux-gnu/libjemalloc.so.2:$LD_PRELOAD" >> ~/.bashrc | ||
| RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc |
There was a problem hiding this comment.
Modifying ~/.bashrc to set LD_LIBRARY_PATH is not a robust solution as it's only sourced for interactive shells. If the container's CMD is changed to run a non-interactive process, this environment variable will not be set, leading to runtime errors. A more reliable method is to configure the system's dynamic linker search paths.
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/mooncake.conf && ldconfig
| python3 -m pip cache purge | ||
|
|
||
| RUN echo "export LD_PRELOAD=/usr/lib/$(uname -m)-linux-gnu/libjemalloc.so.2:$LD_PRELOAD" >> ~/.bashrc | ||
| RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc |
There was a problem hiding this comment.
Modifying ~/.bashrc to set LD_LIBRARY_PATH is not a robust solution as it's only sourced for interactive shells. If the container's CMD is changed to run a non-interactive process, this environment variable will not be set, leading to runtime errors. A more reliable method is to configure the system's dynamic linker search paths.
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/mooncake.conf && ldconfig
| python3 -m pip cache purge | ||
|
|
||
| RUN echo "export LD_PRELOAD=/usr/lib64/libjemalloc.so.2:$LD_PRELOAD" >> ~/.bashrc | ||
| RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc |
There was a problem hiding this comment.
Modifying ~/.bashrc to set LD_LIBRARY_PATH is not a robust solution as it's only sourced for interactive shells. If the container's CMD is changed to run a non-interactive process, this environment variable will not be set, leading to runtime errors. A more reliable method is to configure the system's dynamic linker search paths.
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/mooncake.conf && ldconfig
| python3 -m pip cache purge | ||
|
|
||
| RUN echo "export LD_PRELOAD=/usr/lib/$(uname -m)-linux-gnu/libjemalloc.so.2:$LD_PRELOAD" >> ~/.bashrc | ||
| RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc |
There was a problem hiding this comment.
Modifying ~/.bashrc to set LD_LIBRARY_PATH is not a robust solution as it's only sourced for interactive shells. If the container's CMD is changed to run a non-interactive process, this environment variable will not be set, leading to runtime errors. A more reliable method is to configure the system's dynamic linker search paths.
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/mooncake.conf && ldconfig
| python3 -m pip cache purge | ||
|
|
||
| RUN echo "export LD_PRELOAD=/usr/lib64/libjemalloc.so.2:$LD_PRELOAD" >> ~/.bashrc | ||
| RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc |
There was a problem hiding this comment.
Modifying ~/.bashrc to set LD_LIBRARY_PATH is not a robust solution as it's only sourced for interactive shells. If the container's CMD is changed to run a non-interactive process, this environment variable will not be set, leading to runtime errors. A more reliable method is to configure the system's dynamic linker search paths.
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/mooncake.conf && ldconfig
| python3 -m pip cache purge | ||
|
|
||
| RUN echo "export LD_PRELOAD=/usr/lib64/libjemalloc.so.2:$LD_PRELOAD" >> ~/.bashrc | ||
| RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc |
There was a problem hiding this comment.
Modifying ~/.bashrc to set LD_LIBRARY_PATH is not a robust solution as it's only sourced for interactive shells. If the container's CMD is changed to run a non-interactive process, this environment variable will not be set, leading to runtime errors. A more reliable method is to configure the system's dynamic linker search paths.
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/mooncake.conf && ldconfig
…to qwen3next_graph * 'main' of https://github.com/vllm-project/vllm-ascend: (40 commits) [Feature] Add docs of batch invariance and make some extra operators patch (vllm-project#6910) [bugfix]Qwen2.5VL accurate question (vllm-project#6975) [CI] Add DeepSeek-V3.2 large EP nightly ci (vllm-project#6378) [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (vllm-project#6939) [bugfix]fix file not found error in nightly of single-node (vllm-project#6976) [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (vllm-project#6914) [CI] Enable auto upgrade e2e estimated time for auto-partition suites (vllm-project#6840) [Doc][Misc] Fix msprobe_guide.md documentation issues (vllm-project#6965) [Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (vllm-project#6503) [BugFix] Improve GDN layer detection for multimodal models (vllm-project#6941) [feat]ds3.2 pcp support mtp and chunkprefill (vllm-project#6917) [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (vllm-project#6945) [Triton] Centralize Ascend extension op dispatch in triton_utils (vllm-project#6937) [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (vllm-project#6936) [300I][Bugfix] fix unquant model weight nd2nz error (vllm-project#6851) [doc] fix supported_models (vllm-project#6930) [CI] nightly test timeout (vllm-project#6912) [CI] Upgrade CANN to 8.5.1 (vllm-project#6897) [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (vllm-project#6828) [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (vllm-project#6898) ...
…ect#6976) ### What this PR does / why we need it? 1. The **main image build** takes approximately **two hours**. The main image build time needs to be moved forward to **21pm(UTC+8)** to ensure that the nightly image build can use the latest main image. ``` bash schedule: # UTC+8: 8am, 12pm, 16pm, 22pm - cron: '0 0,4,8,14 * * *' ``` ---> ``` bash schedule: # UTC+8: 8am, 12pm, 16pm, 21pm - cron: '0 0,4,8,13 * * *' ``` Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641055135#step:8:26 2. The nightly test is encountering the following error: ``` bash ImportError: ascend_transport.so: cannot open shared object file: No such file or directory. ``` Path need to be added: ``` bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc ``` Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641054911#step:7:529 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 --------- Signed-off-by: MrZ20 <2609716663@qq.com>
What this PR does / why we need it?
--->
Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641055135#step:8:26
Path need to be added:
Link: https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641054911#step:7:529
Does this PR introduce any user-facing change?
How was this patch tested?