[Build] Build bundled DeepGEMM `_C` per-Python so the wheel imports on every CPython by mgoin · Pull Request #41516 · vllm-project/vllm

mgoin · 2026-05-02T20:18:24Z

Summary

DeepGEMM's _C is a pybind11 module and emits Python-version-specific symbol references; the previous wheel only shipped the build interpreter's .so and silently failed on every other Python (#41476, #41512).

This builds _C once per CPython listed in DEEPGEMM_PYTHON_INTERPRETERS (defaults to 3.10–3.14 in the Dockerfile) and bundles them side-by-side under vllm/third_party/deep_gemm/. cmake's foreach loop falls back to Python_EXECUTABLE when the env var is unset, so editable / single-Python source builds are unchanged. The compile is delegated to tools/build_deepgemm_C.py, which always runs from the build interpreter using its torch — target Pythons just need a working interpreter, no torch installed. tools/setup_deepgemm_pythons.sh provisions bare venvs via uv (re-uses system Pythons when present). A new tools/check_wheel_deepgemm.py verifier runs in the wheel-build stage to fail closed if a .so is missing.

Not a duplicate: PR #41476 attempted to make a single .so work cross-version via the fake-abi3 trick used elsewhere in vLLM. That trick relies on TORCH_LIBRARY-based extensions that bypass Python's import loader; pybind11 modules don't, which is why #41476 was reverted (#41512). This PR takes the per-Python-.so approach instead.

Test plan

Verified locally and inside the pytorch/manylinux2_28-builder:cuda13.0 base image:

_C builds for 3.10/3.11/3.12/3.13/3.14 (cmake foreach via tools/build_deepgemm_C.py)
Each .so loads on its matching Python with torch (get_num_sms returns OK, 57 attrs)
cmake --install --component _deep_gemm_C bundles all 5 .sos + Python files + headers
Editable / single-Python build still works (env var unset → Python_EXECUTABLE)
tools/check_wheel_deepgemm.py flags missing .so, skips wheels w/o deepgemm
DEPENDS triggers rebuild on python_api.cpp change
Doesn't depend on uv at runtime: pip install vllm (precompiled), pip install -e . (editable, single Python), and VLLM_USE_PRECOMPILED=1 flows are uv-free

AI assistance was used to write this patch.

…n every CPython DeepGEMM's `_C` is a pybind11 module and emits Python-version-specific symbol references; the previous wheel only shipped the build interpreter's `.so` and silently failed on every other Python (vllm-project#41476, vllm-project#41512). Build `_C` once per CPython listed in `DEEPGEMM_PYTHON_INTERPRETERS` and bundle them side-by-side under `vllm/third_party/deep_gemm/`. cmake foreach loops over the env var (set by Dockerfile, falls back to the build interpreter for editable / source builds). The compile is delegated to `tools/build_deepgemm_C.py` which runs from the build interpreter using its torch — target Pythons just need a working interpreter, no torch installed. `tools/setup_deepgemm_pythons.sh` provisions bare venvs via uv (re-uses system Pythons when possible). A wheel verifier asserts every required `.so` is present. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request implements a multi-Python build system for the DeepGEMM _C extension, allowing a single wheel to support multiple CPython versions (3.10-3.14). It introduces scripts for provisioning target interpreters, a compilation helper that leverages the build environment's Torch, and a verification tool for the final wheel. Feedback focuses on improving build robustness and maintainability by replacing the external regex dependency with the standard library, avoiding hardcoded version lists, ensuring CUDA_HOME is explicitly validated, and respecting the CXX environment variable for compiler selection.

…check to H100 CI - setup_deepgemm_pythons.sh expands the matrix from `requires-python` when called with no args; Dockerfile no longer hardcodes/forwards the list. - Wheel verifier moves from Dockerfile to the H100 deepgemm kernel test (.buildkite/test_areas/kernels.yaml); rewritten to inspect the installed package rather than wheel zips, derives required Pythons from pyproject.toml. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mergify · 2026-05-02T20:31:24Z

Hi @mgoin, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

- Use `$CXX` (default `g++`) so build environments with a different compiler can override. - Fail closed with a clear message if `cuda_home` resolves to None. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mgoin · 2026-05-02T20:34:39Z

@claude review

mergify · 2026-05-02T20:36:35Z

Hi @mgoin, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Drop the explicit `import vllm.third_party.deep_gemm` (which CI's pinned ruff and local ruff disagree on grouping) and locate the package via `importlib.util.find_spec` instead. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- kernels.yaml: bare `python` doesn't exist in the H100 test image (only `python3` via update-alternatives); switch to `python3`. - deepgemm.cmake: `DEFINED ENV{X}` is true for empty strings, which bypassed the Python_EXECUTABLE fallback for an empty DEEPGEMM_PYTHON_INTERPRETERS and silently produced a wheel without any `_C.so`. Treat empty as unset. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mgoin · 2026-05-02T20:54:09Z

@claude review

claude

Prior review feedback (python3 in CI, importlib.find_spec, empty-env-var guard) is addressed; remaining finding is a header-tracking nit on the per-Python custom command.

Extended reasoning...

Overview

This PR rebuilds DeepGEMM's pybind11 _C once per CPython listed in DEEPGEMM_PYTHON_INTERPRETERS and bundles the .sos side-by-side under vllm/third_party/deep_gemm/. It replaces the prior single-target Python_add_library with a per-interpreter add_custom_command driven by tools/build_deepgemm_C.py, adds tools/setup_deepgemm_pythons.sh to provision bare uv venvs in the Dockerfile, and adds tools/check_wheel_deepgemm.py to fail-close on missing per-Python .sos in the H100 deepgemm CI step. The change is scoped to build/CI files (cmake/external_projects/deepgemm.cmake, docker/Dockerfile, .buildkite/test_areas/kernels.yaml, three new tools/* files).

Security risks

None meaningful — this is a build-system change. The new shell/Python scripts run inside the build environment, parse pyproject.toml for a fixed regex, and shell out to g++/uv. No untrusted input crosses a trust boundary.

Level of scrutiny

CI/build code, no runtime production code paths. Moderate scrutiny for cross-Python ABI correctness was already exercised by the bug-hunting pass (3 prior findings, all addressed). The remaining nit is narrow: add_custom_command does not auto-scan headers, so header-only edits in a local DEEPGEMM_SRC_DIR workflow can leave a stale _C.so. Trigger window is small (CI builds are clean, DeepGEMM is FetchContent-pinned to a specific commit), and easy to fix later by extending DEPENDS or emitting a DEPFILE.

Other factors

The author is responsive — every prior round of feedback (gemini's five comments and my three) has a corresponding commit (Address review: python3 in CI, Address review: respect $CXX, explicit CUDA_HOME check, Auto-derive DEEPGEMM_PYTHONS from pyproject.toml). The test plan in the PR description verifies all five Python builds, the install component, the editable fallback, and the wheel checker. cmake/external_projects/deepgemm.cmake has no specific CODEOWNER; only top-level CMakeLists.txt does. Approving despite the open nit.

claude · 2026-05-02T21:14:16Z

+    add_custom_command(
+      OUTPUT "${_dg_marker}"
+      COMMAND "${Python_EXECUTABLE}"
+              "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py"
+              "${deepgemm_SOURCE_DIR}" "${_dg_dir}" "${_pybin}"
+      COMMAND "${CMAKE_COMMAND}" -E touch "${_dg_marker}"
+      DEPENDS "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py"
+              "${deepgemm_SOURCE_DIR}/csrc/python_api.cpp"
+      COMMENT "Building DeepGEMM _C for ${_pybin}"
+      VERBATIM)


🟡 The new per-Python add_custom_command lists only tools/build_deepgemm_C.py and csrc/python_api.cpp in DEPENDS, but the compile pulls in headers from csrc/, deep_gemm/include/, third-party/cutlass/, and third-party/fmt/. Unlike the prior Python_add_library (which got compiler-driven depfiles), add_custom_command does no implicit header scanning, so a header-only change leaves the cached _C.so untouched on incremental rebuilds. Nit — narrow trigger window (CI does clean Docker builds, DeepGEMM is FetchContent-pinned), but it is a real regression vs. the prior behavior; consider extending DEPENDS to include header globs (note IMPLICIT_DEPENDS CXX is Make-generator-only, so it's a no-op under Ninja).

Extended reasoning...

What the bug is. The per-Python build rule in cmake/external_projects/deepgemm.cmake (lines 88–97) registers:

add_custom_command( OUTPUT "${_dg_marker}" COMMAND "${Python_EXECUTABLE}" "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py" ... COMMAND "${CMAKE_COMMAND}" -E touch "${_dg_marker}" DEPENDS "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py" "${deepgemm_SOURCE_DIR}/csrc/python_api.cpp" ...)

DEPENDS only names the build driver and the single .cpp translation unit. But tools/build_deepgemm_C.py invokes g++ with -I paths into ${deepgemm_SOURCE_DIR}/csrc, deep_gemm/include, third-party/cutlass/include, third-party/cutlass/tools/util/include, and third-party/fmt/include. None of those headers are tracked.

Why this is a regression. The pre-PR code used Python_add_library(_deep_gemm_C MODULE WITH_SOABI ...). That is a real cmake target, and cmake feeds the compile rule through the generator (-MMD -MF depfiles consumed by Ninja/Make), so any #included header transitively invalidated the .so. add_custom_command does no implicit header scanning — only the literal DEPENDS list is consulted. So the new code has strictly weaker dependency tracking than the old code.

Step-by-step proof of the silent-stale-binary case.

A developer sets DEEPGEMM_SRC_DIR=/path/to/local/DeepGEMM and runs pip install -e . once. cmake builds the marker ${_dg_dir}/.built and the corresponding _C.cpython-*.so.

Developer edits /path/to/local/DeepGEMM/deep_gemm/include/deep_gemm/jit_kernels/foo.hpp (or any of the cutlass/fmt headers) to change a function signature consumed by csrc/python_api.cpp.

They rerun the build. cmake re-evaluates the custom command and checks the mtime of every entry in DEPENDS. Only tools/build_deepgemm_C.py and csrc/python_api.cpp are listed — both unchanged — so .built is considered up to date and the rule does not fire.

The previously-built _C.so is reused as-is. It links against the old header signatures while the rest of the (newly-recompiled) project now depends on the new ones, producing silent symbol/ABI drift at runtime instead of an honest rebuild.

A second triggering scenario: a future GIT_TAG bump in deepgemm.cmake lands a header-only change in upstream DeepGEMM (no churn to csrc/python_api.cpp). After FetchContent checks out the new tree, the same logic skips the rebuild on incremental cmake invocations.

Why existing code does not prevent this. The depfile-driven implicit tracking only exists for add_library/add_executable targets. Nothing else fills the gap here: the _dg_marker touchstamp is purely a function of DEPENDS mtimes, and the install rule is keyed off the marker.

Addressing the refutation. The refutation is right that the practical trigger window is narrow: CI does clean Docker builds (so it never hits this path), DeepGEMM is FetchContent-pinned, and most users install prebuilt wheels. That is exactly why this is filed as a nit rather than blocking. But the refutation overstates the case by saying the issue "almost never occurs" — the DEEPGEMM_SRC_DIR workflow is documented in setup_deepgemm_pythons.sh and is the obvious iteration loop for anyone debugging DeepGEMM integration locally. The "rm -rf build && rebuild" workaround is correct but only helps people who already know their .so is stale, which is the root foot-gun.

Suggested fix. Two reasonable options, both small:

Extend DEPENDS to include header globs:
file(GLOB_RECURSE _dg_headers "${deepgemm_SOURCE_DIR}/csrc/*.h" "${deepgemm_SOURCE_DIR}/csrc/*.hpp" "${deepgemm_SOURCE_DIR}/deep_gemm/include/*.hpp" "${deepgemm_SOURCE_DIR}/deep_gemm/include/*.cuh") add_custom_command(... DEPENDS ${_dg_headers} ...)
(Optionally add cutlass/fmt globs, but those are vendored upstream and rarely modified.)

Or have tools/build_deepgemm_C.py emit a .d depfile via -MD -MF and pass it back via DEPFILE (Ninja-only, but vLLM uses Ninja).

Note: the original bug suggested IMPLICIT_DEPENDS CXX, which works only with the Makefile generator. vLLM commonly uses Ninja, so that fix would be a no-op there — explicit globs (or a DEPFILE) are the robust path.

Buildkite CI failure: g++ for the 3.10 target died with `Python.h: No such file or directory`. uv had resolved `--python 3.10` to a system Python whose `INCLUDEPY=/usr/include/python3.10` doesn't exist (no -dev package). Pass `--python-preference only-managed` so uv always uses (and downloads if needed) a managed CPython, whose include dir always has Python.h. Also extend deepgemm.cmake's `DEPENDS` for the per-Python custom command to cover the headers under `csrc/`, `deep_gemm/include/` — `add_custom_command` does no implicit header scanning, so without this a header-only edit silently leaves the cached `_C.so` stale. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ZJY0516

How much will this blow up the vLLM wheel size?

mgoin · 2026-05-03T02:31:39Z

How much will this blow up the vLLM wheel size?

each .so is 1.4 MB so roughly 6 MB, i think this is acceptable for now

claude Bot reviewed May 2, 2026

View reviewed changes

mergify Bot added the ci/build label May 2, 2026

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

Comment thread tools/check_wheel_deepgemm.py

Comment thread tools/check_wheel_deepgemm.py Outdated

Comment thread tools/build_deepgemm_C.py

Comment thread tools/build_deepgemm_C.py

Comment thread tools/build_deepgemm_C.py Outdated

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 2, 2026

claude Bot reviewed May 2, 2026

View reviewed changes

Comment thread .buildkite/test_areas/kernels.yaml Outdated

Comment thread tools/check_wheel_deepgemm.py Outdated

Comment thread cmake/external_projects/deepgemm.cmake Outdated

claude Bot reviewed May 2, 2026

View reviewed changes

mgoin mentioned this pull request May 2, 2026

[CI] Add ci-fetch-log.sh helper for Buildkite job logs #41517

Merged

2 tasks

ZJY0516 reviewed May 3, 2026

View reviewed changes

Merge branch 'main' into deepgemm-multi-python-build

ac23273

Uh oh!

Conversation

mgoin commented May 2, 2026

Summary

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 2, 2026

Uh oh!

mgoin commented May 2, 2026

Uh oh!

mergify Bot commented May 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin commented May 2, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants