[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython#41516
[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython#41516mgoin wants to merge 7 commits intovllm-project:mainfrom
_C per-Python so the wheel imports on every CPython#41516Conversation
…n every CPython DeepGEMM's `_C` is a pybind11 module and emits Python-version-specific symbol references; the previous wheel only shipped the build interpreter's `.so` and silently failed on every other Python (vllm-project#41476, vllm-project#41512). Build `_C` once per CPython listed in `DEEPGEMM_PYTHON_INTERPRETERS` and bundle them side-by-side under `vllm/third_party/deep_gemm/`. cmake foreach loops over the env var (set by Dockerfile, falls back to the build interpreter for editable / source builds). The compile is delegated to `tools/build_deepgemm_C.py` which runs from the build interpreter using its torch — target Pythons just need a working interpreter, no torch installed. `tools/setup_deepgemm_pythons.sh` provisions bare venvs via uv (re-uses system Pythons when possible). A wheel verifier asserts every required `.so` is present. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements a multi-Python build system for the DeepGEMM _C extension, allowing a single wheel to support multiple CPython versions (3.10-3.14). It introduces scripts for provisioning target interpreters, a compilation helper that leverages the build environment's Torch, and a verification tool for the final wheel. Feedback focuses on improving build robustness and maintainability by replacing the external regex dependency with the standard library, avoiding hardcoded version lists, ensuring CUDA_HOME is explicitly validated, and respecting the CXX environment variable for compiler selection.
…check to H100 CI - setup_deepgemm_pythons.sh expands the matrix from `requires-python` when called with no args; Dockerfile no longer hardcodes/forwards the list. - Wheel verifier moves from Dockerfile to the H100 deepgemm kernel test (.buildkite/test_areas/kernels.yaml); rewritten to inspect the installed package rather than wheel zips, derives required Pythons from pyproject.toml. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi @mgoin, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
- Use `$CXX` (default `g++`) so build environments with a different compiler can override. - Fail closed with a clear message if `cuda_home` resolves to None. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
|
Hi @mgoin, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Drop the explicit `import vllm.third_party.deep_gemm` (which CI's pinned ruff and local ruff disagree on grouping) and locate the package via `importlib.util.find_spec` instead. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- kernels.yaml: bare `python` doesn't exist in the H100 test image
(only `python3` via update-alternatives); switch to `python3`.
- deepgemm.cmake: `DEFINED ENV{X}` is true for empty strings, which
bypassed the Python_EXECUTABLE fallback for an empty
DEEPGEMM_PYTHON_INTERPRETERS and silently produced a wheel without any
`_C.so`. Treat empty as unset.
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
There was a problem hiding this comment.
Prior review feedback (python3 in CI, importlib.find_spec, empty-env-var guard) is addressed; remaining finding is a header-tracking nit on the per-Python custom command.
Extended reasoning...
Overview
This PR rebuilds DeepGEMM's pybind11 _C once per CPython listed in DEEPGEMM_PYTHON_INTERPRETERS and bundles the .sos side-by-side under vllm/third_party/deep_gemm/. It replaces the prior single-target Python_add_library with a per-interpreter add_custom_command driven by tools/build_deepgemm_C.py, adds tools/setup_deepgemm_pythons.sh to provision bare uv venvs in the Dockerfile, and adds tools/check_wheel_deepgemm.py to fail-close on missing per-Python .sos in the H100 deepgemm CI step. The change is scoped to build/CI files (cmake/external_projects/deepgemm.cmake, docker/Dockerfile, .buildkite/test_areas/kernels.yaml, three new tools/* files).
Security risks
None meaningful — this is a build-system change. The new shell/Python scripts run inside the build environment, parse pyproject.toml for a fixed regex, and shell out to g++/uv. No untrusted input crosses a trust boundary.
Level of scrutiny
CI/build code, no runtime production code paths. Moderate scrutiny for cross-Python ABI correctness was already exercised by the bug-hunting pass (3 prior findings, all addressed). The remaining nit is narrow: add_custom_command does not auto-scan headers, so header-only edits in a local DEEPGEMM_SRC_DIR workflow can leave a stale _C.so. Trigger window is small (CI builds are clean, DeepGEMM is FetchContent-pinned to a specific commit), and easy to fix later by extending DEPENDS or emitting a DEPFILE.
Other factors
The author is responsive — every prior round of feedback (gemini's five comments and my three) has a corresponding commit (Address review: python3 in CI, Address review: respect $CXX, explicit CUDA_HOME check, Auto-derive DEEPGEMM_PYTHONS from pyproject.toml). The test plan in the PR description verifies all five Python builds, the install component, the editable fallback, and the wheel checker. cmake/external_projects/deepgemm.cmake has no specific CODEOWNER; only top-level CMakeLists.txt does. Approving despite the open nit.
| add_custom_command( | ||
| OUTPUT "${_dg_marker}" | ||
| COMMAND "${Python_EXECUTABLE}" | ||
| "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py" | ||
| "${deepgemm_SOURCE_DIR}" "${_dg_dir}" "${_pybin}" | ||
| COMMAND "${CMAKE_COMMAND}" -E touch "${_dg_marker}" | ||
| DEPENDS "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py" | ||
| "${deepgemm_SOURCE_DIR}/csrc/python_api.cpp" | ||
| COMMENT "Building DeepGEMM _C for ${_pybin}" | ||
| VERBATIM) |
There was a problem hiding this comment.
🟡 The new per-Python add_custom_command lists only tools/build_deepgemm_C.py and csrc/python_api.cpp in DEPENDS, but the compile pulls in headers from csrc/, deep_gemm/include/, third-party/cutlass/, and third-party/fmt/. Unlike the prior Python_add_library (which got compiler-driven depfiles), add_custom_command does no implicit header scanning, so a header-only change leaves the cached _C.so untouched on incremental rebuilds. Nit — narrow trigger window (CI does clean Docker builds, DeepGEMM is FetchContent-pinned), but it is a real regression vs. the prior behavior; consider extending DEPENDS to include header globs (note IMPLICIT_DEPENDS CXX is Make-generator-only, so it's a no-op under Ninja).
Extended reasoning...
What the bug is. The per-Python build rule in cmake/external_projects/deepgemm.cmake (lines 88–97) registers:
add_custom_command(
OUTPUT "${_dg_marker}"
COMMAND "${Python_EXECUTABLE}" "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py" ...
COMMAND "${CMAKE_COMMAND}" -E touch "${_dg_marker}"
DEPENDS "${CMAKE_SOURCE_DIR}/tools/build_deepgemm_C.py"
"${deepgemm_SOURCE_DIR}/csrc/python_api.cpp"
...)DEPENDS only names the build driver and the single .cpp translation unit. But tools/build_deepgemm_C.py invokes g++ with -I paths into ${deepgemm_SOURCE_DIR}/csrc, deep_gemm/include, third-party/cutlass/include, third-party/cutlass/tools/util/include, and third-party/fmt/include. None of those headers are tracked.
Why this is a regression. The pre-PR code used Python_add_library(_deep_gemm_C MODULE WITH_SOABI ...). That is a real cmake target, and cmake feeds the compile rule through the generator (-MMD -MF depfiles consumed by Ninja/Make), so any #included header transitively invalidated the .so. add_custom_command does no implicit header scanning — only the literal DEPENDS list is consulted. So the new code has strictly weaker dependency tracking than the old code.
Step-by-step proof of the silent-stale-binary case.
- A developer sets
DEEPGEMM_SRC_DIR=/path/to/local/DeepGEMMand runspip install -e .once. cmake builds the marker${_dg_dir}/.builtand the corresponding_C.cpython-*.so. - Developer edits
/path/to/local/DeepGEMM/deep_gemm/include/deep_gemm/jit_kernels/foo.hpp(or any of the cutlass/fmt headers) to change a function signature consumed bycsrc/python_api.cpp. - They rerun the build. cmake re-evaluates the custom command and checks the mtime of every entry in
DEPENDS. Onlytools/build_deepgemm_C.pyandcsrc/python_api.cppare listed — both unchanged — so.builtis considered up to date and the rule does not fire. - The previously-built
_C.sois reused as-is. It links against the old header signatures while the rest of the (newly-recompiled) project now depends on the new ones, producing silent symbol/ABI drift at runtime instead of an honest rebuild.
A second triggering scenario: a future GIT_TAG bump in deepgemm.cmake lands a header-only change in upstream DeepGEMM (no churn to csrc/python_api.cpp). After FetchContent checks out the new tree, the same logic skips the rebuild on incremental cmake invocations.
Why existing code does not prevent this. The depfile-driven implicit tracking only exists for add_library/add_executable targets. Nothing else fills the gap here: the _dg_marker touchstamp is purely a function of DEPENDS mtimes, and the install rule is keyed off the marker.
Addressing the refutation. The refutation is right that the practical trigger window is narrow: CI does clean Docker builds (so it never hits this path), DeepGEMM is FetchContent-pinned, and most users install prebuilt wheels. That is exactly why this is filed as a nit rather than blocking. But the refutation overstates the case by saying the issue "almost never occurs" — the DEEPGEMM_SRC_DIR workflow is documented in setup_deepgemm_pythons.sh and is the obvious iteration loop for anyone debugging DeepGEMM integration locally. The "rm -rf build && rebuild" workaround is correct but only helps people who already know their .so is stale, which is the root foot-gun.
Suggested fix. Two reasonable options, both small:
- Extend
DEPENDSto include header globs:(Optionally add cutlass/fmt globs, but those are vendored upstream and rarely modified.)file(GLOB_RECURSE _dg_headers "${deepgemm_SOURCE_DIR}/csrc/*.h" "${deepgemm_SOURCE_DIR}/csrc/*.hpp" "${deepgemm_SOURCE_DIR}/deep_gemm/include/*.hpp" "${deepgemm_SOURCE_DIR}/deep_gemm/include/*.cuh") add_custom_command(... DEPENDS ${_dg_headers} ...)
- Or have
tools/build_deepgemm_C.pyemit a.ddepfile via-MD -MFand pass it back viaDEPFILE(Ninja-only, but vLLM uses Ninja).
Note: the original bug suggested IMPLICIT_DEPENDS CXX, which works only with the Makefile generator. vLLM commonly uses Ninja, so that fix would be a no-op there — explicit globs (or a DEPFILE) are the robust path.
Buildkite CI failure: g++ for the 3.10 target died with `Python.h: No such file or directory`. uv had resolved `--python 3.10` to a system Python whose `INCLUDEPY=/usr/include/python3.10` doesn't exist (no -dev package). Pass `--python-preference only-managed` so uv always uses (and downloads if needed) a managed CPython, whose include dir always has Python.h. Also extend deepgemm.cmake's `DEPENDS` for the per-Python custom command to cover the headers under `csrc/`, `deep_gemm/include/` — `add_custom_command` does no implicit header scanning, so without this a header-only edit silently leaves the cached `_C.so` stale. Signed-off-by: mgoin <mgoin64@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZJY0516
left a comment
There was a problem hiding this comment.
How much will this blow up the vLLM wheel size?
each .so is 1.4 MB so roughly 6 MB, i think this is acceptable for now |
Summary
DeepGEMM's
_Cis a pybind11 module and emits Python-version-specific symbol references; the previous wheel only shipped the build interpreter's.soand silently failed on every other Python (#41476, #41512).This builds
_Conce per CPython listed inDEEPGEMM_PYTHON_INTERPRETERS(defaults to 3.10–3.14 in the Dockerfile) and bundles them side-by-side undervllm/third_party/deep_gemm/. cmake's foreach loop falls back toPython_EXECUTABLEwhen the env var is unset, so editable / single-Python source builds are unchanged. The compile is delegated totools/build_deepgemm_C.py, which always runs from the build interpreter using its torch — target Pythons just need a working interpreter, no torch installed.tools/setup_deepgemm_pythons.shprovisions bare venvs via uv (re-uses system Pythons when present). A newtools/check_wheel_deepgemm.pyverifier runs in the wheel-build stage to fail closed if a.sois missing.Not a duplicate: PR #41476 attempted to make a single
.sowork cross-version via the fake-abi3 trick used elsewhere in vLLM. That trick relies on TORCH_LIBRARY-based extensions that bypass Python's import loader; pybind11 modules don't, which is why #41476 was reverted (#41512). This PR takes the per-Python-.soapproach instead.Test plan
Verified locally and inside the
pytorch/manylinux2_28-builder:cuda13.0base image:_Cbuilds for 3.10/3.11/3.12/3.13/3.14 (cmake foreach viatools/build_deepgemm_C.py).soloads on its matching Python with torch (get_num_smsreturns OK, 57 attrs)cmake --install --component _deep_gemm_Cbundles all 5.sos + Python files + headersPython_EXECUTABLE)tools/check_wheel_deepgemm.pyflags missing.so, skips wheels w/o deepgemmpython_api.cppchangepip install vllm(precompiled),pip install -e .(editable, single Python), andVLLM_USE_PRECOMPILED=1flows are uv-freeAI assistance was used to write this patch.