[Doc] Guide for Incremental Compilation Workflow by mgoin · Pull Request #19109 · vllm-project/vllm

mgoin · 2025-06-03T22:06:10Z

Purpose

I have not seen a proper guide for how to efficiently use incremental compilation and I've recently found this CMake-based setup from @ProExpertProg that I've been using for the past two weeks to great success. This PR tries to summarize the workflow in the contributing docs

It also includes a new script in the tools directory that will generate a CMakeUserPresets.json for you

> python tools/generate_cmake_presets.py
Attempting to detect your system configuration...
Using NVCC path: /usr/local/cuda-12.8/bin/nvcc
Enter the path to your Python executable for vLLM development (typically from your virtual environment, e.g., /home/user/venvs/vllm/bin/python).
Press Enter to use the current detected Python: '/home/mgoin/venvs/vllm/bin/python': 
Using Python executable: /home/mgoin/venvs/vllm/bin/python
Detected 128 CPU cores. Setting NVCC_THREADS=4 and CMake jobs=32.
VLLM project root detected as: /home/mgoin/code/vllm
Successfully generated '/home/mgoin/code/vllm/CMakeUserPresets.json'

To use this preset:
1. Ensure you are in the vLLM root directory: cd /home/mgoin/code/vllm
2. Configure CMake: cmake --preset release
3. Build and install: cmake --build --preset release --target install

Signed-off-by: mgoin <mgoin64@gmail.com>

github-actions · 2025-06-03T22:06:19Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Hello @mgoin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request to help everyone quickly understand the changes and context.

This PR introduces a new documentation guide detailing an incremental compilation workflow for vLLM's C++/CUDA kernels. The goal is to significantly speed up the development loop for engineers working on these performance-critical components by leveraging CMake for faster, incremental builds compared to the full pip install -e . process.

Highlights

New Documentation Guide: A comprehensive guide (docs/contributing/incremental_build.md) has been added, explaining how to set up and use a CMake-based workflow for incremental compilation of vLLM's C++/CUDA kernels. This includes prerequisites, configuring CMakeUserPresets.json, and the build/install process.
Improved Developer Workflow: The guide provides a method for faster iteration when developing kernels, which is a common pain point with traditional full rebuilds. It highlights the use of tools like ccache and proper CMake configuration.
Documentation Cross-Linking: Links to the new incremental build guide have been added to the main contributing README and the GPU installation guide to ensure developers are aware of this optimized workflow.

Changelog

docs/contributing/README.md
- Added a link to the new Incremental Compilation Workflow guide in the 'Building from Source' section (line 32).
- Added a recommendation to use the Incremental Compilation Workflow when adding or changing kernels (line 191).
docs/contributing/incremental_build.md
- Added a new documentation file detailing the setup and usage of a CMake-based incremental compilation workflow for C++/CUDA kernels.
- Includes sections on prerequisites, CMakeUserPresets.json configuration (with an example), building/installing with CMake, verifying the build, and tips for efficiency.
docs/getting_started/installation/gpu/cuda.inc.md
- Added a note in the installation guide recommending the Incremental Compilation Workflow for faster kernel development after the initial editable install (lines 156-157).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Code compiles slow,
Kernels take time to grow,
CMake builds are fast.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a very helpful guide for setting up an incremental compilation workflow using CMake, which will undoubtedly speed up development for those working on C++/CUDA kernels. The documentation is well-structured and clear. I have a couple of suggestions to make the setup process even smoother for users, particularly around locating necessary paths.

Summary of Findings

Guidance for CMAKE_CUDA_COMPILER Path: The documentation should provide a hint on how users can find the correct path to their nvcc binary, as this is a critical setting for the CMake configuration.
Guidance for VLLM_PYTHON_EXECUTABLE Path: The documentation should offer a tip on how users can determine the path to the Python executable within their active vLLM development virtual environment, another crucial setting.

Merge Readiness

The pull request is in great shape and provides valuable documentation. Addressing the suggestions for helping users find critical paths will make the guide even more robust and user-friendly. Once these medium-severity suggestions are considered, I believe this PR will be ready for merging. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from the maintainers.

docs/contributing/incremental_build.md

ProExpertProg

Thanks for putting this together!!

ProExpertProg · 2025-06-03T22:12:32Z

docs/contributing/incremental_build.md

+
+2. **CUDA Toolkit:** Verify that the NVIDIA CUDA Toolkit is correctly installed and `nvcc` is accessible in your `PATH`. CMake relies on `nvcc` to compile CUDA code. If you encounter issues, refer to the [official CUDA Toolkit installation guides](https://developer.nvidia.com/cuda-toolkit-archive) and vLLM's main [GPU installation documentation](../getting_started/installation/gpu/cuda.inc.md#troubleshooting) for troubleshooting. The `CMAKE_CUDA_COMPILER` variable in your `CMakeUserPresets.json` should also point to your `nvcc` binary.
+
+3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager.


Maybe include rocm-build here or somehow mention non-cuda platforms?

I have essentially no experience with rocm-build, would you want to add it here or in a followup?

docs/contributing/incremental_build.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

houseroad · 2025-06-03T23:14:03Z

docs/contributing/incremental_build.md

+3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager.
+
+    ```bash
+    uv pip install -r requirements/build.txt --torch-backend=auto


maybe not directly related, but curious, wondering when do we recommend uv pip, when we use pip directly in the doc?

yeah I think it's not homogeneous yet, but we're trying to replace every instance with uv. Some accelerators (tpu/gaudi/aws) though are still using pip. TPU side for instance is due to how the wheels are published, which have uv fail due to difference in package resolution.

docs/contributing/incremental_build.md

NickLucche · 2025-06-04T06:39:07Z

docs/contributing/incremental_build.md

+3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager.
+
+    ```bash
+    uv pip install -r requirements/build.txt --torch-backend=auto


yeah I think it's not homogeneous yet, but we're trying to replace every instance with uv. Some accelerators (tpu/gaudi/aws) though are still using pip. TPU side for instance is due to how the wheels are published, which have uv fail due to difference in package resolution.

docs/contributing/incremental_build.md

Signed-off-by: mgoin <mgoin64@gmail.com>

ProExpertProg

Looks good overall! Might be worth pulling in a few utilities from setup.py

tools/generate_cmake_presets.py

Signed-off-by: mgoin <mgoin64@gmail.com>

ProExpertProg

Looks good, thanks for doing this!

ekagra-ranjan · 2025-07-10T14:23:54Z

docs/contributing/incremental_build.md

+   This command compiles the code and installs the resulting binaries into your vLLM source directory, making them available to your editable Python installation.
+
+   ```console
+   cmake --build --preset release --target install


does this build from scratch? Do you know how long it takes nowadays?

It will do the incremental build but the first one is from scratch, yes. Time depends on the platform and number of cores, but I think minutes.

Guide for Incremental Compilation Workflow

538cee1

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin requested a review from hmellor as a code owner June 3, 2025 22:06

gemini-code-assist bot reviewed Jun 3, 2025

View reviewed changes

mergify bot added the documentation Improvements or additions to documentation label Jun 3, 2025

gemini-code-assist bot suggested changes Jun 3, 2025

View reviewed changes

docs/contributing/incremental_build.md Outdated Show resolved Hide resolved

docs/contributing/incremental_build.md Outdated Show resolved Hide resolved

ProExpertProg approved these changes Jun 3, 2025

View reviewed changes

Apply suggestions from code review

b541cbd

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

houseroad reviewed Jun 3, 2025

View reviewed changes

docs/contributing/incremental_build.md Outdated Show resolved Hide resolved

NickLucche reviewed Jun 4, 2025

View reviewed changes

yewentao256 reviewed Jun 4, 2025

View reviewed changes

docs/contributing/incremental_build.md Show resolved Hide resolved

yewentao256 approved these changes Jun 4, 2025

View reviewed changes

Add script to autogenerate

9920254

Signed-off-by: mgoin <mgoin64@gmail.com>

ProExpertProg reviewed Jun 5, 2025

View reviewed changes

tools/generate_cmake_presets.py Outdated Show resolved Hide resolved

tools/generate_cmake_presets.py Outdated Show resolved Hide resolved

yewentao256 mentioned this pull request Jun 5, 2025

[Perf] Vectorize static / dynamic INT8 quant kernels #19233

Merged

3 tasks

mgoin added 2 commits June 9, 2025 15:18

Merge branch 'main' into cmake-incremental-build

78b8840

Improve the script to match setup.py

3742c64

Signed-off-by: mgoin <mgoin64@gmail.com>

ProExpertProg approved these changes Jun 10, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 10, 2025

mgoin merged commit bf51815 into vllm-project:main Jun 25, 2025
53 checks passed

ekagra-ranjan reviewed Jul 10, 2025

View reviewed changes


		2. CUDA Toolkit: Verify that the NVIDIA CUDA Toolkit is correctly installed and `nvcc` is accessible in your `PATH`. CMake relies on `nvcc` to compile CUDA code. If you encounter issues, refer to the [official CUDA Toolkit installation guides](https://developer.nvidia.com/cuda-toolkit-archive) and vLLM's main [GPU installation documentation](../getting_started/installation/gpu/cuda.inc.md#troubleshooting) for troubleshooting. The `CMAKE_CUDA_COMPILER` variable in your `CMakeUserPresets.json` should also point to your `nvcc` binary.

		3. Build Tools: Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager.

Uh oh!

Conversation

mgoin commented Jun 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

houseroad Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ekagra-ranjan Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mgoin commented Jun 3, 2025 •

edited by github-actions bot

Loading