Skip to content

Conversation

@fabiendupont
Copy link
Contributor

@fabiendupont fabiendupont commented May 13, 2025

The current Dockerfile assumes that many build artifacts are available from public repositories and downloads them from these repositories, making it more difficult for downstream distributions to perform hermetic builds of vLLM container images.

This pull request introduces changes that should be backward compatible, while improving the situation for hermetic builds. Below is the list of changes:

  • Build argument for the base images. Currently both the build and final images are based on NVIDIA CUDA devel images, but with different Ubuntu versions.
  • Support for a mirror of the Deadsnakes PPA. This also requires installing the GPG key manually, as it's deprecated in add-apt-repository.
  • Installation of pip from a pyz archive. This is an alternative to get-pip.py that allows using a private copy of the pip.pyz file.
  • Alternative Python indexes. The {PIP,UV}_INDEX_URL and {PIP,UV}_EXTRA_INDEX_URL variables are native. Reusing the same naming convention for CUDA and CUDA nightly indexes.
  • Custom URL and endpoint for sccache. The URL is for installation from an internal repository. The endpoint allows storing the cache in a non-AWS S3 backend.
  • Custom FlashInfer installation. The change allows using package name+version, a Git URL or even a local .whl file.
  • Add pip/uv authentication with keyring. The pip/uv command supports a {PIP,UV}_KEYRING_PROVIDER and it is set to disabled by default.

Co-authored-by: Elias Levy [email protected]

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label May 13, 2025
@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch 2 times, most recently from b001fdd to 8767f63 Compare May 13, 2025 12:09
Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to unit test this?

Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall and makes sense. I am a little concerned about fragility -- it isn't clear from looking at the Dockerfile why these variables are factored out into arguments, so I could easily see someone breaking the hermetic builds. I suggest adding some inline comments.

@khluu could you please take a look as well?

@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 8767f63 to 6fae6df Compare May 14, 2025 10:28
@fabiendupont
Copy link
Contributor Author

@tlrmchlsmth, that's a fair point. I have added comments for the build arguments.

Regarding unit testing, I'm not sure, unless we want to setup mirrors for the various artifacts. I have mostly focused on non-regression for the existing build system.

@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 6fae6df to 26ddb80 Compare May 14, 2025 11:58
@mergify mergify bot added the documentation Improvements or additions to documentation label May 14, 2025
@mergify
Copy link

mergify bot commented May 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fabiendupont.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 16, 2025
@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from e29a974 to 6ef1977 Compare May 20, 2025 12:27
@mergify mergify bot removed the needs-rebase label May 20, 2025
@khluu khluu added the ready ONLY add when PR is ready to merge/full CI is needed label May 20, 2025
@DarkLight1337
Copy link
Member

DarkLight1337 commented May 21, 2025

test_topk_topp_sampler.py and test_rejection_sampler.py are not failing on main, so I think it's related to this PR

@mergify
Copy link

mergify bot commented May 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fabiendupont.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 21, 2025
@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from c1b62cd to a5444ad Compare May 21, 2025 15:19
@mergify mergify bot removed the needs-rebase label May 21, 2025
@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 8e0ce96 to 68a4420 Compare May 21, 2025 16:09
@DarkLight1337
Copy link
Member

You can merge from main after #18543 is merged to fix the CI

@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 68a4420 to 9110843 Compare May 22, 2025 15:20
@mergify
Copy link

mergify bot commented May 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fabiendupont.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 23, 2025
@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 364e751 to 78c3923 Compare June 3, 2025 07:47
@fabiendupont fabiendupont requested a review from hmellor as a code owner June 3, 2025 07:47
@mergify mergify bot removed the needs-rebase label Jun 3, 2025
@simon-mo
Copy link
Collaborator

https://buildkite.com/vllm/ci/builds/22272/steps/table?sid=01978274-0ef8-4e19-81ea-2521eaaf40af

11.8 build failed, either due to rebase or parametrization.

@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 4ab5f84 to 927c04d Compare June 23, 2025 15:42
@fabiendupont
Copy link
Contributor Author

@simon-mo, I am not sure it's because of the parameterization. The CUDA 12.1 and 11.8 versions don't support the 10.0 capabilities, which were added in 12.4. So, maybe we should drop CUDA <= 12.4 on the main branch.

@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 927c04d to 5bbafb3 Compare June 24, 2025 17:08
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the base image for vllm-openai to be just the runtime cuda image, whereas we can maybe keep the devel image for test and CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not meant to change the behaviour of the current build, only allow using internal URL for hermetic builds. Not saying it's a bad idea, only not the purpose, so it could be a follow-up PR.

@mergify
Copy link

mergify bot commented Jun 26, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fabiendupont.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 26, 2025
fabiendupont and others added 12 commits June 26, 2025 03:19
Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
The default values for the build args should be specified only
once to reduce risk of desynchronization. This commit also moves
the comments to the top of the Dockerfile to avoid text
duplication.

Co-authored-by: Elias Levy <[email protected]>
Signed-off-by: Fabien Dupont <[email protected]>
UV is using its own environment variables to set the package
indexes. This commit adds the UV_INDEX_URL, UV_EXTRA_INDEX_URL
and UV_KEYRING_PROVIDER. The defaults are copied from the PIP
build args for convenience.

Co-authored-by: Elias Levy <[email protected]>
Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
- Remove extra white spaces after '\' on end of line. This shouldn't be
  an issue with the current command, but could make troubleshooting more
  difficult.

- Use PYTHON_VERSION variable when copying to Python paths.

Signed-off-by: Fabien Dupont <[email protected]>
@fabiendupont fabiendupont force-pushed the allow-hermetic-builds branch from 5bbafb3 to d53f83b Compare June 26, 2025 07:24
@mergify mergify bot removed the needs-rebase label Jun 26, 2025
@WoosukKwon WoosukKwon merged commit 3c545c0 into vllm-project:main Jun 27, 2025
95 of 99 checks passed
@schulluk
Copy link

schulluk commented Jul 3, 2025

This merge request is a great move into the right direction.
Thanks for proposing this change, @fabiendupont!

There are additional repository definitions directly in the requirement files like --extra-index-url in ./requirements/cpu-build.txt or cloning external dependencies from github with cmake configurations in ./cmake/external_projects/ that we can address in a follow-up proposal 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.