[CI/Build]: make it possible to build with a free-threaded interpreter#29241
[CI/Build]: make it possible to build with a free-threaded interpreter#29241vllm-bot merged 2 commits intovllm-project:mainfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
53ea5d3 to
33907a1
Compare
There was a problem hiding this comment.
Code Review
This pull request aims to enable building vLLM with a free-threaded Python interpreter by disabling the Limited C API and Stable ABI, which are not supported in free-threaded builds. The changes correctly modify setup.py to conditionally disable py_limited_api. However, there is a critical issue in cmake/utils.cmake where the check for a free-threaded interpreter will not work as intended due to how CMake handles boolean strings, which could break existing builds. I've provided a suggestion to fix this.
33907a1 to
67553b4
Compare
For Python 3.13/3.14, free-threaded Python does not support using the
Limited C API and the Stable ABI.
Without this change, the build ends with:
```
In file included from /path/to/vllm/csrc/core/registration.h:3,
from /path/to/vllm/csrc/cpu/torch_bindings.cpp:3:
/..../include/python3.14t/Python.h:51:4: error: #error "The limited API is not currently supported in the free-threaded build"
```
Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
67553b4 to
c082838
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces changes to enable building vLLM with a free-threaded Python interpreter, a feature becoming available in Python 3.13/3.14. The author correctly identifies that the Limited C API and Stable ABI are not supported in free-threaded builds. The changes disable these features by modifying both setup.py for setuptools and cmake/utils.cmake for the CMake build process. The detection of a free-threaded interpreter is implemented correctly in both places using sysconfig.get_config_var("Py_GIL_DISABLED"). The changes are clear, well-motivated, and appear correct. I have not found any issues of high or critical severity.
| class CMakeExtension(Extension): | ||
| def __init__(self, name: str, cmake_lists_dir: str = ".", **kwa) -> None: | ||
| super().__init__(name, sources=[], py_limited_api=True, **kwa) | ||
| super().__init__(name, sources=[], py_limited_api=not is_freethreaded(), **kwa) |
There was a problem hiding this comment.
dumb question: is py_limited_api=False required when using the free-threaded python ?
There was a problem hiding this comment.
Thanks for the review @ApostaC. Yes, it should be set to False, otherwise setuptools raises an exception. The limited API is not supported yet with free-threading. There is very active work on adding that support for 3.15 (either PEP 803 or PEP 809 will add it, and both require PEP 793), but that's a new ABI which will be compatible with both free-threaded and with-GIL interpreters. Using that in the future will require both a new setuptools version and some source-level changes in vLLM to use PyModExport (PEP 793). So until that's all done, the limited API has to be avoided here.
|
Thanks for triggering CI @DarkLight1337. I looked at the 4 failing jobs as well as the two "soft-failed" ones visible only in Buildkite - none seem related. The I don't see the same failures in other recent PRs, but these jobs don't seem to be running in those PRs - I guess that depends on which files one touches? If there's any action I should take here, please let me know. |
|
The tests are also failing on main, so we can merge |
|
Does it only support compilation on the CPU? |
error log[132/136] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_sm80.cu.o [133/136] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v1.cu.o ninja: build stopped: subcommand failed. [stderr] /home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set warnings.warn(self.message) CMake Warning at /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): Failed to compute shorthash for libnvrtc.so Call Stack (most recent call first): /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) CMakeLists.txt:91 (find_package) |
|
Great, thanks @DarkLight1337. @HSIEHCHIACHI for CUDA the same change is needed in
|
|
@rgommers Thanks. It has been solved! |
vllm-project#29241) Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
vllm-project#29241) Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Purpose
For Python 3.13/3.14, free-threaded Python does not support using the Limited C API and the Stable ABI. The purpose of this PR is to ensure that vLLM can be built from source under a free-threaded interpreter. It doesn't change anything for default (with-GIL) Python interpreters.
See gh-28762 for more context on getting vLLM to work with free-threaded Python.
Note that this same change is needed in vllm_flash_attn; PR at vllm-project/flash-attention#112. The order in which these two PRs are merged doesn't matter.
Test Plan
This was tested locally. It's too early to propose adding CI, given the dependencies that don't provide free-threaded wheels yet or need to be built from source (xref gh-28762).
Test Result
Without this change, the build ends with:
With this change, the build succeeds and a
cp313torcp314twheel is built as expected (tested locally for CPU and CUDA on Linux x86-64).Here is a build log for CPU as an example:
Build log - editable build for CPU in a cp314t environment
Note: Python 3.14 isn't yet supported by vLLM (but more interesting for free-threading), so it requires this tiny patch locally:
Patch for allowing a 3.14 interpreter
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.