Skip to content

[Doc] Guide for Incremental Compilation Workflow#19109

Merged
mgoin merged 5 commits intovllm-project:mainfrom
neuralmagic:cmake-incremental-build
Jun 25, 2025
Merged

[Doc] Guide for Incremental Compilation Workflow#19109
mgoin merged 5 commits intovllm-project:mainfrom
neuralmagic:cmake-incremental-build

Conversation

@mgoin
Copy link
Copy Markdown
Member

@mgoin mgoin commented Jun 3, 2025

Purpose

I have not seen a proper guide for how to efficiently use incremental compilation and I've recently found this CMake-based setup from @ProExpertProg that I've been using for the past two weeks to great success. This PR tries to summarize the workflow in the contributing docs

It also includes a new script in the tools directory that will generate a CMakeUserPresets.json for you

> python tools/generate_cmake_presets.py
Attempting to detect your system configuration...
Using NVCC path: /usr/local/cuda-12.8/bin/nvcc
Enter the path to your Python executable for vLLM development (typically from your virtual environment, e.g., /home/user/venvs/vllm/bin/python).
Press Enter to use the current detected Python: '/home/mgoin/venvs/vllm/bin/python': 
Using Python executable: /home/mgoin/venvs/vllm/bin/python
Detected 128 CPU cores. Setting NVCC_THREADS=4 and CMake jobs=32.
VLLM project root detected as: /home/mgoin/code/vllm
Successfully generated '/home/mgoin/code/vllm/CMakeUserPresets.json'

To use this preset:
1. Ensure you are in the vLLM root directory: cd /home/mgoin/code/vllm
2. Configure CMake: cmake --preset release
3. Build and install: cmake --build --preset release --target install

Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin requested a review from hmellor as a code owner June 3, 2025 22:06
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jun 3, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request to help everyone quickly understand the changes and context.

This PR introduces a new documentation guide detailing an incremental compilation workflow for vLLM's C++/CUDA kernels. The goal is to significantly speed up the development loop for engineers working on these performance-critical components by leveraging CMake for faster, incremental builds compared to the full pip install -e . process.

Highlights

  • New Documentation Guide: A comprehensive guide (docs/contributing/incremental_build.md) has been added, explaining how to set up and use a CMake-based workflow for incremental compilation of vLLM's C++/CUDA kernels. This includes prerequisites, configuring CMakeUserPresets.json, and the build/install process.
  • Improved Developer Workflow: The guide provides a method for faster iteration when developing kernels, which is a common pain point with traditional full rebuilds. It highlights the use of tools like ccache and proper CMake configuration.
  • Documentation Cross-Linking: Links to the new incremental build guide have been added to the main contributing README and the GPU installation guide to ensure developers are aware of this optimized workflow.

Changelog

  • docs/contributing/README.md
    • Added a link to the new Incremental Compilation Workflow guide in the 'Building from Source' section (line 32).
    • Added a recommendation to use the Incremental Compilation Workflow when adding or changing kernels (line 191).
  • docs/contributing/incremental_build.md
    • Added a new documentation file detailing the setup and usage of a CMake-based incremental compilation workflow for C++/CUDA kernels.
    • Includes sections on prerequisites, CMakeUserPresets.json configuration (with an example), building/installing with CMake, verifying the build, and tips for efficiency.
  • docs/getting_started/installation/gpu/cuda.inc.md
    • Added a note in the installation guide recommending the Incremental Compilation Workflow for faster kernel development after the initial editable install (lines 156-157).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Code compiles slow,
Kernels take time to grow,
CMake builds are fast.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the documentation Improvements or additions to documentation label Jun 3, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a very helpful guide for setting up an incremental compilation workflow using CMake, which will undoubtedly speed up development for those working on C++/CUDA kernels. The documentation is well-structured and clear. I have a couple of suggestions to make the setup process even smoother for users, particularly around locating necessary paths.

Summary of Findings

  • Guidance for CMAKE_CUDA_COMPILER Path: The documentation should provide a hint on how users can find the correct path to their nvcc binary, as this is a critical setting for the CMake configuration.
  • Guidance for VLLM_PYTHON_EXECUTABLE Path: The documentation should offer a tip on how users can determine the path to the Python executable within their active vLLM development virtual environment, another crucial setting.

Merge Readiness

The pull request is in great shape and provides valuable documentation. Addressing the suggestions for helping users find critical paths will make the guide even more robust and user-friendly. Once these medium-severity suggestions are considered, I believe this PR will be ready for merging. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from the maintainers.

Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together!!


2. **CUDA Toolkit:** Verify that the NVIDIA CUDA Toolkit is correctly installed and `nvcc` is accessible in your `PATH`. CMake relies on `nvcc` to compile CUDA code. If you encounter issues, refer to the [official CUDA Toolkit installation guides](https://developer.nvidia.com/cuda-toolkit-archive) and vLLM's main [GPU installation documentation](../getting_started/installation/gpu/cuda.inc.md#troubleshooting) for troubleshooting. The `CMAKE_CUDA_COMPILER` variable in your `CMakeUserPresets.json` should also point to your `nvcc` binary.

3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include rocm-build here or somehow mention non-cuda platforms?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have essentially no experience with rocm-build, would you want to add it here or in a followup?

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager.

```bash
uv pip install -r requirements/build.txt --torch-backend=auto
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not directly related, but curious, wondering when do we recommend uv pip, when we use pip directly in the doc?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think it's not homogeneous yet, but we're trying to replace every instance with uv. Some accelerators (tpu/gaudi/aws) though are still using pip. TPU side for instance is due to how the wheels are published, which have uv fail due to difference in package resolution.

3. **Build Tools:** Ensure the dependencies for building are installed and available, like `cmake` and `ninja`. These are installable through the `requirements/build.txt`, or can be installed by your package manager.

```bash
uv pip install -r requirements/build.txt --torch-backend=auto
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think it's not homogeneous yet, but we're trying to replace every instance with uv. Some accelerators (tpu/gaudi/aws) though are still using pip. TPU side for instance is due to how the wheels are published, which have uv fail due to difference in package resolution.

Signed-off-by: mgoin <mgoin64@gmail.com>
Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! Might be worth pulling in a few utilities from setup.py

Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for doing this!

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 10, 2025
@mgoin mgoin merged commit bf51815 into vllm-project:main Jun 25, 2025
53 checks passed
This command compiles the code and installs the resulting binaries into your vLLM source directory, making them available to your editable Python installation.

```console
cmake --build --preset release --target install
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this build from scratch? Do you know how long it takes nowadays?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will do the incremental build but the first one is from scratch, yes. Time depends on the platform and number of cores, but I think minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants