-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
[Build] Build bundled DeepGEMM _C per-Python so the wheel imports on every CPython
#41516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mgoin
wants to merge
7
commits into
vllm-project:main
Choose a base branch
from
mgoin:deepgemm-multi-python-build
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+254
−43
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
c0b910a
[Build] Build bundled DeepGEMM `_C` per-Python so the wheel imports o…
mgoin 308f33d
[Build] Auto-derive DEEPGEMM_PYTHONS from pyproject.toml; move wheel …
mgoin 84aa4ad
[Build] Address review: respect $CXX, explicit CUDA_HOME check
mgoin 0eba3a4
[Build] Fix import ordering for ruff 0.14.0 in CI
mgoin f40d2a7
[Build] Address review: python3 in CI; treat empty env var as unset
mgoin ee669bd
[Build] Force uv-managed Python so per-Python venvs ship dev headers
mgoin ac23273
Merge branch 'main' into deepgemm-multi-python-build
mgoin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| """Build DeepGEMM's `_C` pybind11 extension for a target Python. | ||
|
|
||
| Driven from `cmake/external_projects/deepgemm.cmake`. The driver is the | ||
| build interpreter (which has torch); the *target* Python is only used for | ||
| its header path and SOABI. This avoids needing torch installed in N venvs | ||
| to produce N matching `.so` files. | ||
|
|
||
| Usage: python build_deepgemm_C.py <DEEPGEMM_SRC_DIR> <OUTPUT_DIR> <TARGET_PY> | ||
| """ | ||
|
|
||
| import json | ||
| import os | ||
| import subprocess | ||
| import sys | ||
|
mgoin marked this conversation as resolved.
|
||
| from pathlib import Path | ||
|
|
||
| import torch | ||
| from torch.utils import cpp_extension | ||
|
|
||
| if len(sys.argv) != 4: | ||
| sys.exit(f"usage: {sys.argv[0]} <SRC> <OUT> <TARGET_PY>") | ||
|
|
||
| src = Path(sys.argv[1]).resolve() | ||
| out = Path(sys.argv[2]).resolve() | ||
| target_py = sys.argv[3] | ||
| out.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| info = json.loads( | ||
| subprocess.check_output( | ||
| [ | ||
| target_py, | ||
| "-c", | ||
| "import sysconfig, json; " | ||
| "print(json.dumps({k: sysconfig.get_config_var(k) " | ||
| "for k in ('EXT_SUFFIX', 'INCLUDEPY')}))", | ||
| ] | ||
| ).decode() | ||
| ) | ||
|
|
||
| cuda_home = cpp_extension.CUDA_HOME | ||
|
mgoin marked this conversation as resolved.
|
||
| if cuda_home is None: | ||
| sys.exit("CUDA_HOME not found; cannot build DeepGEMM _C") | ||
| # CCCL lives outside the standard CUDAToolkit search, mirroring DeepGEMM's | ||
| # own setup.py. | ||
| includes = [ | ||
| info["INCLUDEPY"], | ||
| f"{cuda_home}/include", | ||
| f"{cuda_home}/include/cccl", | ||
| str(src / "csrc"), | ||
| str(src / "deep_gemm/include"), | ||
| str(src / "third-party/cutlass/include"), | ||
| str(src / "third-party/cutlass/tools/util/include"), | ||
| str(src / "third-party/fmt/include"), | ||
| *cpp_extension.include_paths(device_type="cuda"), | ||
| ] | ||
|
|
||
| cmd = [ | ||
| os.environ.get("CXX", "g++"), | ||
| "-shared", | ||
| "-fPIC", | ||
| "-std=c++17", | ||
| "-O3", | ||
| "-g0", | ||
| "-Wno-psabi", | ||
| "-Wno-deprecated-declarations", | ||
| "-DTORCH_API_INCLUDE_EXTENSION_H", | ||
| "-DTORCH_EXTENSION_NAME=_C", | ||
| f"-D_GLIBCXX_USE_CXX11_ABI={int(torch.compiled_with_cxx11_abi())}", | ||
| *(f"-I{p}" for p in includes), | ||
| str(src / "csrc/python_api.cpp"), | ||
| *(f"-L{p}" for p in cpp_extension.library_paths(device_type="cuda")), | ||
| f"-L{cuda_home}/lib64", | ||
| "-ltorch", | ||
| "-ltorch_python", | ||
| "-ltorch_cpu", | ||
| "-ltorch_cuda", | ||
| "-lc10", | ||
| "-lc10_cuda", | ||
| "-lcudart", | ||
| "-lnvrtc", | ||
| "-o", | ||
| str(out / f"_C{info['EXT_SUFFIX']}"), | ||
| ] | ||
| print("[build_deepgemm_C] " + " ".join(cmd), flush=True) | ||
| subprocess.check_call(cmd) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
|
||
| """Assert the installed vLLM has a `_C.cpython-X.Y-*.so` for every CPython | ||
| covered by `requires-python`. Fails closed if a Python's `.so` is missing | ||
| from the wheel — i.e. the regression that surfaced in #41476/#41512. | ||
|
|
||
| Run from a CI test job after vLLM is installed, e.g. the H100 deepgemm | ||
| kernel tests in .buildkite/test_areas/kernels.yaml. | ||
| """ | ||
|
|
||
| import importlib.util | ||
| import os | ||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| import regex as re | ||
|
mgoin marked this conversation as resolved.
|
||
| import tomllib | ||
|
|
||
| SO_RE = re.compile(r"^_C\.cpython-(\d)(\d+)-") | ||
|
|
||
|
|
||
| def required_pythons() -> list[str]: | ||
| pyproject = Path(__file__).resolve().parent.parent / "pyproject.toml" | ||
| spec = tomllib.loads(pyproject.read_text())["project"]["requires-python"] | ||
| m = re.match(r">=3\.(\d+),<3\.(\d+)", spec) | ||
| if not m: | ||
| sys.exit(f"unexpected requires-python format: {spec!r}") | ||
| return [f"3.{v}" for v in range(int(m[1]), int(m[2]))] | ||
|
|
||
|
|
||
| spec = importlib.util.find_spec("vllm.third_party.deep_gemm") | ||
| if spec is None or spec.origin is None: | ||
| sys.exit("vllm.third_party.deep_gemm not importable; is vllm installed?") | ||
| pkg_dir = Path(spec.origin).parent | ||
|
|
||
| found = {f"{m[1]}.{m[2]}" for f in os.listdir(pkg_dir) if (m := SO_RE.match(f))} | ||
| required = required_pythons() | ||
| missing = [v for v in required if v not in found] | ||
| print(f"deepgemm _C: found {sorted(found)}, required {required}, missing {missing}") | ||
| sys.exit(1 if missing else 0) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| #!/usr/bin/env bash | ||
| # Provision bare Python interpreters for the DeepGEMM `_C` per-Python build | ||
| # and print a colon-separated list of their paths to stdout. | ||
| # | ||
| # Each target Python only needs a working interpreter — torch is not | ||
| # installed since `tools/build_deepgemm_C.py` runs from the build interpreter. | ||
| # uv re-uses any matching system Python and downloads a managed build | ||
| # otherwise. | ||
| # | ||
| # Usage: | ||
| # export DEEPGEMM_PYTHON_INTERPRETERS=$(tools/setup_deepgemm_pythons.sh) | ||
| # python setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38 | ||
| # | ||
| # With no args, expands to every CPython covered by `requires-python` in | ||
| # pyproject.toml. Pass explicit versions (e.g. `3.10 3.11`) to override. | ||
| # | ||
| # Skip this script if you don't have uv: set DEEPGEMM_PYTHON_INTERPRETERS | ||
| # directly to existing interpreter paths. Editable / single-Python builds | ||
| # don't need the env var at all (cmake falls back to the build interpreter). | ||
| # | ||
| # Optional: DEEPGEMM_VENV_PREFIX (default: /tmp/dgenv). | ||
| set -euo pipefail | ||
|
|
||
| if [ "$#" -eq 0 ]; then | ||
| # Derive the matrix from `requires-python = ">=3.X,<3.Y"` in pyproject.toml. | ||
| pyproject="$(dirname "$0")/../pyproject.toml" | ||
| spec=$(grep -E '^requires-python' "$pyproject" \ | ||
| | grep -oE '>=3\.[0-9]+,<3\.[0-9]+') | ||
| lo=${spec#>=3.}; lo=${lo%%,*} | ||
| hi=${spec##*<3.} | ||
| set -- $(seq "$lo" $((hi - 1)) | sed 's/^/3./') | ||
| fi | ||
|
|
||
| prefix="${DEEPGEMM_VENV_PREFIX:-/tmp/dgenv}" | ||
| mkdir -p "$prefix" | ||
|
|
||
| paths="" | ||
| for V in "$@"; do | ||
| venv="$prefix/$V" | ||
| # Force a managed (uv-downloaded) Python so dev headers are bundled. | ||
| # System Pythons on the build base may lack headers (manylinux's | ||
| # /opt/python/cpXY-cpXY are off PATH; an apt-installed python3.X often | ||
| # has no -dev), and the per-Python build needs Python.h. | ||
| [ -x "$venv/bin/python" ] || \ | ||
| uv venv --python "$V" "$venv" --python-preference only-managed --seed \ | ||
| >/dev/null | ||
| paths="$paths:$venv/bin/python" | ||
| done | ||
| echo "${paths#:}" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 The new per-Python
add_custom_commandlists onlytools/build_deepgemm_C.pyandcsrc/python_api.cppinDEPENDS, but the compile pulls in headers fromcsrc/,deep_gemm/include/,third-party/cutlass/, andthird-party/fmt/. Unlike the priorPython_add_library(which got compiler-driven depfiles),add_custom_commanddoes no implicit header scanning, so a header-only change leaves the cached_C.sountouched on incremental rebuilds. Nit — narrow trigger window (CI does clean Docker builds, DeepGEMM is FetchContent-pinned), but it is a real regression vs. the prior behavior; consider extendingDEPENDSto include header globs (noteIMPLICIT_DEPENDS CXXis Make-generator-only, so it's a no-op under Ninja).Extended reasoning...
What the bug is. The per-Python build rule in
cmake/external_projects/deepgemm.cmake(lines 88–97) registers:DEPENDSonly names the build driver and the single.cpptranslation unit. Buttools/build_deepgemm_C.pyinvokesg++with-Ipaths into${deepgemm_SOURCE_DIR}/csrc,deep_gemm/include,third-party/cutlass/include,third-party/cutlass/tools/util/include, andthird-party/fmt/include. None of those headers are tracked.Why this is a regression. The pre-PR code used
Python_add_library(_deep_gemm_C MODULE WITH_SOABI ...). That is a real cmake target, and cmake feeds the compile rule through the generator (-MMD -MFdepfiles consumed by Ninja/Make), so any#included header transitively invalidated the.so.add_custom_commanddoes no implicit header scanning — only the literalDEPENDSlist is consulted. So the new code has strictly weaker dependency tracking than the old code.Step-by-step proof of the silent-stale-binary case.
DEEPGEMM_SRC_DIR=/path/to/local/DeepGEMMand runspip install -e .once. cmake builds the marker${_dg_dir}/.builtand the corresponding_C.cpython-*.so./path/to/local/DeepGEMM/deep_gemm/include/deep_gemm/jit_kernels/foo.hpp(or any of the cutlass/fmt headers) to change a function signature consumed bycsrc/python_api.cpp.DEPENDS. Onlytools/build_deepgemm_C.pyandcsrc/python_api.cppare listed — both unchanged — so.builtis considered up to date and the rule does not fire._C.sois reused as-is. It links against the old header signatures while the rest of the (newly-recompiled) project now depends on the new ones, producing silent symbol/ABI drift at runtime instead of an honest rebuild.A second triggering scenario: a future
GIT_TAGbump indeepgemm.cmakelands a header-only change in upstream DeepGEMM (no churn tocsrc/python_api.cpp). AfterFetchContentchecks out the new tree, the same logic skips the rebuild on incremental cmake invocations.Why existing code does not prevent this. The depfile-driven implicit tracking only exists for
add_library/add_executabletargets. Nothing else fills the gap here: the_dg_markertouchstamp is purely a function ofDEPENDSmtimes, and the install rule is keyed off the marker.Addressing the refutation. The refutation is right that the practical trigger window is narrow: CI does clean Docker builds (so it never hits this path), DeepGEMM is FetchContent-pinned, and most users install prebuilt wheels. That is exactly why this is filed as a nit rather than blocking. But the refutation overstates the case by saying the issue "almost never occurs" — the
DEEPGEMM_SRC_DIRworkflow is documented insetup_deepgemm_pythons.shand is the obvious iteration loop for anyone debugging DeepGEMM integration locally. The "rm -rf build && rebuild" workaround is correct but only helps people who already know their.sois stale, which is the root foot-gun.Suggested fix. Two reasonable options, both small:
DEPENDSto include header globs:tools/build_deepgemm_C.pyemit a.ddepfile via-MD -MFand pass it back viaDEPFILE(Ninja-only, but vLLM uses Ninja).Note: the original bug suggested
IMPLICIT_DEPENDS CXX, which works only with the Makefile generator. vLLM commonly uses Ninja, so that fix would be a no-op there — explicit globs (or aDEPFILE) are the robust path.