[Doc] Guide for Incremental Compilation Workflow#19109
[Doc] Guide for Incremental Compilation Workflow#19109mgoin merged 5 commits intovllm-project:mainfrom
Conversation
Signed-off-by: mgoin <mgoin64@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request to help everyone quickly understand the changes and context.
This PR introduces a new documentation guide detailing an incremental compilation workflow for vLLM's C++/CUDA kernels. The goal is to significantly speed up the development loop for engineers working on these performance-critical components by leveraging CMake for faster, incremental builds compared to the full pip install -e . process.
Highlights
- New Documentation Guide: A comprehensive guide (
docs/contributing/incremental_build.md) has been added, explaining how to set up and use a CMake-based workflow for incremental compilation of vLLM's C++/CUDA kernels. This includes prerequisites, configuringCMakeUserPresets.json, and the build/install process. - Improved Developer Workflow: The guide provides a method for faster iteration when developing kernels, which is a common pain point with traditional full rebuilds. It highlights the use of tools like
ccacheand proper CMake configuration. - Documentation Cross-Linking: Links to the new incremental build guide have been added to the main contributing README and the GPU installation guide to ensure developers are aware of this optimized workflow.
Changelog
- docs/contributing/README.md
- Added a link to the new Incremental Compilation Workflow guide in the 'Building from Source' section (line 32).
- Added a recommendation to use the Incremental Compilation Workflow when adding or changing kernels (line 191).
- docs/contributing/incremental_build.md
- Added a new documentation file detailing the setup and usage of a CMake-based incremental compilation workflow for C++/CUDA kernels.
- Includes sections on prerequisites,
CMakeUserPresets.jsonconfiguration (with an example), building/installing with CMake, verifying the build, and tips for efficiency.
- docs/getting_started/installation/gpu/cuda.inc.md
- Added a note in the installation guide recommending the Incremental Compilation Workflow for faster kernel development after the initial editable install (lines 156-157).
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Code compiles slow,
Kernels take time to grow,
CMake builds are fast.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a very helpful guide for setting up an incremental compilation workflow using CMake, which will undoubtedly speed up development for those working on C++/CUDA kernels. The documentation is well-structured and clear. I have a couple of suggestions to make the setup process even smoother for users, particularly around locating necessary paths.
Summary of Findings
- Guidance for
CMAKE_CUDA_COMPILERPath: The documentation should provide a hint on how users can find the correct path to theirnvccbinary, as this is a critical setting for the CMake configuration. - Guidance for
VLLM_PYTHON_EXECUTABLEPath: The documentation should offer a tip on how users can determine the path to the Python executable within their active vLLM development virtual environment, another crucial setting.
Merge Readiness
The pull request is in great shape and provides valuable documentation. Addressing the suggestions for helping users find critical paths will make the guide even more robust and user-friendly. Once these medium-severity suggestions are considered, I believe this PR will be ready for merging. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from the maintainers.
ProExpertProg
left a comment
There was a problem hiding this comment.
Thanks for putting this together!!
|
|
||
| 2. **CUDA Toolkit:** Verify that the NVIDIA CUDA Toolkit is correctly installed and `nvcc` is accessible in your `PATH`. CMake relies on `nvcc` to compile CUDA code. If you encounter issues, refer to the [official CUDA Toolkit installation guides](https://developer.nvidia.com/cuda-toolkit-archive) and vLLM's main [GPU installation documentation](../getting_started/installation/gpu/cuda.inc.md#troubleshooting) for troubleshooting. The `CMAKE_CUDA_COMPILER` variable in your `CMakeUserPresets.json` should also point to your `nvcc` binary. | ||
|
|
||
| 3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager. |
There was a problem hiding this comment.
Maybe include rocm-build here or somehow mention non-cuda platforms?
There was a problem hiding this comment.
I have essentially no experience with rocm-build, would you want to add it here or in a followup?
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
| 3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager. | ||
|
|
||
| ```bash | ||
| uv pip install -r requirements/build.txt --torch-backend=auto |
There was a problem hiding this comment.
maybe not directly related, but curious, wondering when do we recommend uv pip, when we use pip directly in the doc?
There was a problem hiding this comment.
yeah I think it's not homogeneous yet, but we're trying to replace every instance with uv. Some accelerators (tpu/gaudi/aws) though are still using pip. TPU side for instance is due to how the wheels are published, which have uv fail due to difference in package resolution.
| 3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager. | ||
|
|
||
| ```bash | ||
| uv pip install -r requirements/build.txt --torch-backend=auto |
There was a problem hiding this comment.
yeah I think it's not homogeneous yet, but we're trying to replace every instance with uv. Some accelerators (tpu/gaudi/aws) though are still using pip. TPU side for instance is due to how the wheels are published, which have uv fail due to difference in package resolution.
Signed-off-by: mgoin <mgoin64@gmail.com>
ProExpertProg
left a comment
There was a problem hiding this comment.
Looks good overall! Might be worth pulling in a few utilities from setup.py
Signed-off-by: mgoin <mgoin64@gmail.com>
ProExpertProg
left a comment
There was a problem hiding this comment.
Looks good, thanks for doing this!
| This command compiles the code and installs the resulting binaries into your vLLM source directory, making them available to your editable Python installation. | ||
|
|
||
| ```console | ||
| cmake --build --preset release --target install |
There was a problem hiding this comment.
does this build from scratch? Do you know how long it takes nowadays?
There was a problem hiding this comment.
It will do the incremental build but the first one is from scratch, yes. Time depends on the platform and number of cores, but I think minutes.
Purpose
I have not seen a proper guide for how to efficiently use incremental compilation and I've recently found this CMake-based setup from @ProExpertProg that I've been using for the past two weeks to great success. This PR tries to summarize the workflow in the contributing docs
It also includes a new script in the tools directory that will generate a
CMakeUserPresets.jsonfor you