Skip to content

[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython#41516

Open
mgoin wants to merge 7 commits intovllm-project:mainfrom
mgoin:deepgemm-multi-python-build
Open

[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython#41516
mgoin wants to merge 7 commits intovllm-project:mainfrom
mgoin:deepgemm-multi-python-build

Conversation

@mgoin
Copy link
Copy Markdown
Member

@mgoin mgoin commented May 2, 2026

Summary

DeepGEMM's _C is a pybind11 module and emits Python-version-specific symbol references; the previous wheel only shipped the build interpreter's .so and silently failed on every other Python (#41476, #41512).

This builds _C once per CPython listed in DEEPGEMM_PYTHON_INTERPRETERS (defaults to 3.10–3.14 in the Dockerfile) and bundles them side-by-side under vllm/third_party/deep_gemm/. cmake's foreach loop falls back to Python_EXECUTABLE when the env var is unset, so editable / single-Python source builds are unchanged. The compile is delegated to tools/build_deepgemm_C.py, which always runs from the build interpreter using its torch — target Pythons just need a working interpreter, no torch installed. tools/setup_deepgemm_pythons.sh provisions bare venvs via uv (re-uses system Pythons when present). A new tools/check_wheel_deepgemm.py verifier runs in the wheel-build stage to fail closed if a .so is missing.

Not a duplicate: PR #41476 attempted to make a single .so work cross-version via the fake-abi3 trick used elsewhere in vLLM. That trick relies on TORCH_LIBRARY-based extensions that bypass Python's import loader; pybind11 modules don't, which is why #41476 was reverted (#41512). This PR takes the per-Python-.so approach instead.

Test plan

Verified locally and inside the pytorch/manylinux2_28-builder:cuda13.0 base image:

  • _C builds for 3.10/3.11/3.12/3.13/3.14 (cmake foreach via tools/build_deepgemm_C.py)
  • Each .so loads on its matching Python with torch (get_num_sms returns OK, 57 attrs)
  • cmake --install --component _deep_gemm_C bundles all 5 .sos + Python files + headers
  • Editable / single-Python build still works (env var unset → Python_EXECUTABLE)
  • tools/check_wheel_deepgemm.py flags missing .so, skips wheels w/o deepgemm
  • DEPENDS triggers rebuild on python_api.cpp change
  • Doesn't depend on uv at runtime: pip install vllm (precompiled), pip install -e . (editable, single Python), and VLLM_USE_PRECOMPILED=1 flows are uv-free

AI assistance was used to write this patch.

…n every CPython

DeepGEMM's `_C` is a pybind11 module and emits Python-version-specific
symbol references; the previous wheel only shipped the build interpreter's
`.so` and silently failed on every other Python (vllm-project#41476, vllm-project#41512). Build
`_C` once per CPython listed in `DEEPGEMM_PYTHON_INTERPRETERS` and bundle
them side-by-side under `vllm/third_party/deep_gemm/`.

cmake foreach loops over the env var (set by Dockerfile, falls back to
the build interpreter for editable / source builds). The compile is
delegated to `tools/build_deepgemm_C.py` which runs from the build
interpreter using its torch — target Pythons just need a working
interpreter, no torch installed. `tools/setup_deepgemm_pythons.sh`
provisions bare venvs via uv (re-uses system Pythons when possible).
A wheel verifier asserts every required `.so` is present.

Signed-off-by: mgoin <mgoin64@gmail.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the ci/build label May 2, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a multi-Python build system for the DeepGEMM _C extension, allowing a single wheel to support multiple CPython versions (3.10-3.14). It introduces scripts for provisioning target interpreters, a compilation helper that leverages the build environment's Torch, and a verification tool for the final wheel. Feedback focuses on improving build robustness and maintainability by replacing the external regex dependency with the standard library, avoiding hardcoded version lists, ensuring CUDA_HOME is explicitly validated, and respecting the CXX environment variable for compiler selection.

Comment thread tools/check_wheel_deepgemm.py
Comment thread tools/check_wheel_deepgemm.py Outdated
Comment thread tools/build_deepgemm_C.py
Comment thread tools/build_deepgemm_C.py
Comment thread tools/build_deepgemm_C.py Outdated
…check to H100 CI

- setup_deepgemm_pythons.sh expands the matrix from `requires-python` when
  called with no args; Dockerfile no longer hardcodes/forwards the list.
- Wheel verifier moves from Dockerfile to the H100 deepgemm kernel test
  (.buildkite/test_areas/kernels.yaml); rewritten to inspect the installed
  package rather than wheel zips, derives required Pythons from pyproject.toml.

Signed-off-by: mgoin <mgoin64@gmail.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 2, 2026

Hi @mgoin, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

- Use `$CXX` (default `g++`) so build environments with a different
  compiler can override.
- Fail closed with a clear message if `cuda_home` resolves to None.

Signed-off-by: mgoin <mgoin64@gmail.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mgoin
Copy link
Copy Markdown
Member Author

mgoin commented May 2, 2026

@claude review

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 2, 2026

Hi @mgoin, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Drop the explicit `import vllm.third_party.deep_gemm` (which CI's pinned
ruff and local ruff disagree on grouping) and locate the package via
`importlib.util.find_spec` instead.

Signed-off-by: mgoin <mgoin64@gmail.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 2, 2026
Comment thread .buildkite/test_areas/kernels.yaml Outdated
Comment thread tools/check_wheel_deepgemm.py Outdated
Comment thread cmake/external_projects/deepgemm.cmake Outdated
- kernels.yaml: bare `python` doesn't exist in the H100 test image
  (only `python3` via update-alternatives); switch to `python3`.
- deepgemm.cmake: `DEFINED ENV{X}` is true for empty strings, which
  bypassed the Python_EXECUTABLE fallback for an empty
  DEEPGEMM_PYTHON_INTERPRETERS and silently produced a wheel without any
  `_C.so`. Treat empty as unset.

Signed-off-by: mgoin <mgoin64@gmail.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mgoin
Copy link
Copy Markdown
Member Author

mgoin commented May 2, 2026

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior review feedback (python3 in CI, importlib.find_spec, empty-env-var guard) is addressed; remaining finding is a header-tracking nit on the per-Python custom command.

Extended reasoning...

Overview

This PR rebuilds DeepGEMM's pybind11 _C once per CPython listed in DEEPGEMM_PYTHON_INTERPRETERS and bundles the .sos side-by-side under vllm/third_party/deep_gemm/. It replaces the prior single-target Python_add_library with a per-interpreter add_custom_command driven by tools/build_deepgemm_C.py, adds tools/setup_deepgemm_pythons.sh to provision bare uv venvs in the Dockerfile, and adds tools/check_wheel_deepgemm.py to fail-close on missing per-Python .sos in the H100 deepgemm CI step. The change is scoped to build/CI files (cmake/external_projects/deepgemm.cmake, docker/Dockerfile, .buildkite/test_areas/kernels.yaml, three new tools/* files).

Security risks

None meaningful — this is a build-system change. The new shell/Python scripts run inside the build environment, parse pyproject.toml for a fixed regex, and shell out to g++/uv. No untrusted input crosses a trust boundary.

Level of scrutiny

CI/build code, no runtime production code paths. Moderate scrutiny for cross-Python ABI correctness was already exercised by the bug-hunting pass (3 prior findings, all addressed). The remaining nit is narrow: add_custom_command does not auto-scan headers, so header-only edits in a local DEEPGEMM_SRC_DIR workflow can leave a stale _C.so. Trigger window is small (CI builds are clean, DeepGEMM is FetchContent-pinned to a specific commit), and easy to fix later by extending DEPENDS or emitting a DEPFILE.

Other factors

The author is responsive — every prior round of feedback (gemini's five comments and my three) has a corresponding commit (Address review: python3 in CI, Address review: respect $CXX, explicit CUDA_HOME check, Auto-derive DEEPGEMM_PYTHONS from pyproject.toml). The test plan in the PR description verifies all five Python builds, the install component, the editable fallback, and the wheel checker. cmake/external_projects/deepgemm.cmake has no specific CODEOWNER; only top-level CMakeLists.txt does. Approving despite the open nit.

Comment on lines +88 to +97
add_custom_command(
OUTPUT "${_dg_marker}"
COMMAND "${Python_EXECUTABLE}"
"${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py"
"${deepgemm_SOURCE_DIR}" "${_dg_dir}" "${_pybin}"
COMMAND "${CMAKE_COMMAND}" -E touch "${_dg_marker}"
DEPENDS "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py"
"${deepgemm_SOURCE_DIR}/csrc/python_api.cpp"
COMMENT "Building DeepGEMM _C for ${_pybin}"
VERBATIM)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new per-Python add_custom_command lists only tools/build_deepgemm_C.py and csrc/python_api.cpp in DEPENDS, but the compile pulls in headers from csrc/, deep_gemm/include/, third-party/cutlass/, and third-party/fmt/. Unlike the prior Python_add_library (which got compiler-driven depfiles), add_custom_command does no implicit header scanning, so a header-only change leaves the cached _C.so untouched on incremental rebuilds. Nit — narrow trigger window (CI does clean Docker builds, DeepGEMM is FetchContent-pinned), but it is a real regression vs. the prior behavior; consider extending DEPENDS to include header globs (note IMPLICIT_DEPENDS CXX is Make-generator-only, so it's a no-op under Ninja).

Extended reasoning...

What the bug is. The per-Python build rule in cmake/external_projects/deepgemm.cmake (lines 88–97) registers:

add_custom_command(
  OUTPUT "${_dg_marker}"
  COMMAND "${Python_EXECUTABLE}" "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py" ...
  COMMAND "${CMAKE_COMMAND}" -E touch "${_dg_marker}"
  DEPENDS "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py"
          "${deepgemm_SOURCE_DIR}/csrc/python_api.cpp"
  ...)

DEPENDS only names the build driver and the single .cpp translation unit. But tools/build_deepgemm_C.py invokes g++ with -I paths into ${deepgemm_SOURCE_DIR}/csrc, deep_gemm/include, third-party/cutlass/include, third-party/cutlass/tools/util/include, and third-party/fmt/include. None of those headers are tracked.

Why this is a regression. The pre-PR code used Python_add_library(_deep_gemm_C MODULE WITH_SOABI ...). That is a real cmake target, and cmake feeds the compile rule through the generator (-MMD -MF depfiles consumed by Ninja/Make), so any #included header transitively invalidated the .so. add_custom_command does no implicit header scanning — only the literal DEPENDS list is consulted. So the new code has strictly weaker dependency tracking than the old code.

Step-by-step proof of the silent-stale-binary case.

  1. A developer sets DEEPGEMM_SRC_DIR=/path/to/local/DeepGEMM and runs pip install -e . once. cmake builds the marker ${_dg_dir}/.built and the corresponding _C.cpython-*.so.
  2. Developer edits /path/to/local/DeepGEMM/deep_gemm/include/deep_gemm/jit_kernels/foo.hpp (or any of the cutlass/fmt headers) to change a function signature consumed by csrc/python_api.cpp.
  3. They rerun the build. cmake re-evaluates the custom command and checks the mtime of every entry in DEPENDS. Only tools/build_deepgemm_C.py and csrc/python_api.cpp are listed — both unchanged — so .built is considered up to date and the rule does not fire.
  4. The previously-built _C.so is reused as-is. It links against the old header signatures while the rest of the (newly-recompiled) project now depends on the new ones, producing silent symbol/ABI drift at runtime instead of an honest rebuild.

A second triggering scenario: a future GIT_TAG bump in deepgemm.cmake lands a header-only change in upstream DeepGEMM (no churn to csrc/python_api.cpp). After FetchContent checks out the new tree, the same logic skips the rebuild on incremental cmake invocations.

Why existing code does not prevent this. The depfile-driven implicit tracking only exists for add_library/add_executable targets. Nothing else fills the gap here: the _dg_marker touchstamp is purely a function of DEPENDS mtimes, and the install rule is keyed off the marker.

Addressing the refutation. The refutation is right that the practical trigger window is narrow: CI does clean Docker builds (so it never hits this path), DeepGEMM is FetchContent-pinned, and most users install prebuilt wheels. That is exactly why this is filed as a nit rather than blocking. But the refutation overstates the case by saying the issue "almost never occurs" — the DEEPGEMM_SRC_DIR workflow is documented in setup_deepgemm_pythons.sh and is the obvious iteration loop for anyone debugging DeepGEMM integration locally. The "rm -rf build && rebuild" workaround is correct but only helps people who already know their .so is stale, which is the root foot-gun.

Suggested fix. Two reasonable options, both small:

  • Extend DEPENDS to include header globs:
    file(GLOB_RECURSE _dg_headers
      "${deepgemm_SOURCE_DIR}/csrc/*.h"
      "${deepgemm_SOURCE_DIR}/csrc/*.hpp"
      "${deepgemm_SOURCE_DIR}/deep_gemm/include/*.hpp"
      "${deepgemm_SOURCE_DIR}/deep_gemm/include/*.cuh")
    add_custom_command(... DEPENDS ${_dg_headers} ...)
    (Optionally add cutlass/fmt globs, but those are vendored upstream and rarely modified.)
  • Or have tools/build_deepgemm_C.py emit a .d depfile via -MD -MF and pass it back via DEPFILE (Ninja-only, but vLLM uses Ninja).

Note: the original bug suggested IMPLICIT_DEPENDS CXX, which works only with the Makefile generator. vLLM commonly uses Ninja, so that fix would be a no-op there — explicit globs (or a DEPFILE) are the robust path.

Buildkite CI failure: g++ for the 3.10 target died with
`Python.h: No such file or directory`. uv had resolved `--python 3.10`
to a system Python whose `INCLUDEPY=/usr/include/python3.10` doesn't
exist (no -dev package). Pass `--python-preference only-managed` so uv
always uses (and downloads if needed) a managed CPython, whose include
dir always has Python.h.

Also extend deepgemm.cmake's `DEPENDS` for the per-Python custom command
to cover the headers under `csrc/`, `deep_gemm/include/` — `add_custom_command`
does no implicit header scanning, so without this a header-only edit
silently leaves the cached `_C.so` stale.

Signed-off-by: mgoin <mgoin64@gmail.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much will this blow up the vLLM wheel size?

@mgoin
Copy link
Copy Markdown
Member Author

mgoin commented May 3, 2026

How much will this blow up the vLLM wheel size?

each .so is 1.4 MB so roughly 6 MB, i think this is acceptable for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants