Skip to content

[CI/Build]: make it possible to build with a free-threaded interpreter#29241

Merged
vllm-bot merged 2 commits intovllm-project:mainfrom
rgommers:bld-freethreading
Nov 28, 2025
Merged

[CI/Build]: make it possible to build with a free-threaded interpreter#29241
vllm-bot merged 2 commits intovllm-project:mainfrom
rgommers:bld-freethreading

Conversation

@rgommers
Copy link
Copy Markdown
Contributor

@rgommers rgommers commented Nov 22, 2025

Purpose

For Python 3.13/3.14, free-threaded Python does not support using the Limited C API and the Stable ABI. The purpose of this PR is to ensure that vLLM can be built from source under a free-threaded interpreter. It doesn't change anything for default (with-GIL) Python interpreters.

See gh-28762 for more context on getting vLLM to work with free-threaded Python.

Note that this same change is needed in vllm_flash_attn; PR at vllm-project/flash-attention#112. The order in which these two PRs are merged doesn't matter.

Test Plan

This was tested locally. It's too early to propose adding CI, given the dependencies that don't provide free-threaded wheels yet or need to be built from source (xref gh-28762).

Test Result

Without this change, the build ends with:

  In file included from /path/to/vllm/csrc/core/registration.h:3,
                   from /path/to/vllm/csrc/cpu/torch_bindings.cpp:3:
  /..../include/python3.14t/Python.h:51:4: error: #error "The limited API is not currently supported in the free-threaded build"

With this change, the build succeeds and a cp313t or cp314t wheel is built as expected (tested locally for CPU and CUDA on Linux x86-64).

Here is a build log for CPU as an example:

Build log - editable build for CPU in a cp314t environment
$ VLLM_TARGET_DEVICE=cpu python -m pip install -e . -v --no-build-isolation --no-deps
Using pip 25.3 from /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/pip (python 3.14)
Obtaining file:///home/rgommers/code/tmp/pixidev-vllm/vllm/vllm
  Running command Checking if build backend supports build_editable
  Checking if build backend supports build_editable ... done
  Running command Preparing editable metadata (pyproject.toml)
  /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set
    warnings.warn(self.message)
  running dist_info
  creating /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info
  writing /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/dependency_links.txt
  writing entry points to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/entry_points.txt
  writing requirements to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/requires.txt
  writing top-level names to /tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/top_level.txt
  writing manifest file '/tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-modern-metadata-v__mr59k/vllm.egg-info/SOURCES.txt'
  creating '/tmp/pip-modern-metadata-v__mr59k/vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info'
  Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: vllm
  Running command Building editable for vllm (pyproject.toml)
  /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set
    warnings.warn(self.message)
  running editable_wheel
  creating /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info
  writing /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/dependency_links.txt
  writing entry points to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/entry_points.txt
  writing requirements to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/requires.txt
  writing top-level names to /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/top_level.txt
  writing manifest file '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm.egg-info/SOURCES.txt'
  creating '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info'
  creating /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/WHEEL
  running build_py
  running build_ext
  -- The CXX compiler identification is GNU 14.3.0
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/ccache - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Build type: Release
  -- Target device: cpu
  -- Found Python: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/python (found version "3.14.0") found components: Interpreter Development.Module Development.SABIModule
  -- Found python matching: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/python.
  CMake Warning at /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
    static library kineto_LIBRARY-NOTFOUND not found.
  Call Stack (most recent call first):
    /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found)
    CMakeLists.txt:91 (find_package)


  -- Found Torch: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/torch/lib/libtorch.so
  CMake Warning at cmake/cpu_extension.cmake:127 (message):
    Disable AVX512-BF16 ISA support, no avx512_bf16 found in local CPU flags.
    If cross-compilation is required, please set env VLLM_CPU_AVX512BF16=1.
  Call Stack (most recent call first):
    CMakeLists.txt:111 (include)


  CMake Warning at cmake/cpu_extension.cmake:142 (message):
    Disable AVX512-VNNI ISA support, no avx512_vnni found in local CPU flags.
    If cross-compilation is required, please set env VLLM_CPU_AVX512VNNI=1.
  Call Stack (most recent call first):
    CMakeLists.txt:111 (include)


  CMake Warning at cmake/cpu_extension.cmake:158 (message):
    Disable AMX_BF16 ISA support, no amx_bf16 found in local CPU flags.  If
    cross-compilation is required, please set env VLLM_CPU_AMXBF16=1.
  Call Stack (most recent call first):
    CMakeLists.txt:111 (include)


  -- Downloading oneDNN from GitHub
  -- The C compiler identification is GNU 14.3.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/ccache - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- DNNL_TARGET_ARCH: X64
  -- DNNL compat: set DNNL_VERBOSE to ONEDNN_VERBOSE with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_MAX_CPU_ISA to ONEDNN_ENABLE_MAX_CPU_ISA with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_CPU_ISA_HINTS to ONEDNN_ENABLE_CPU_ISA_HINTS with value `OFF`
  -- DNNL compat: set DNNL_BUILD_DOC to ONEDNN_BUILD_DOC with value `OFF`
  -- DNNL compat: set DNNL_BUILD_EXAMPLES to ONEDNN_BUILD_EXAMPLES with value `OFF`
  -- DNNL compat: set DNNL_BUILD_TESTS to ONEDNN_BUILD_TESTS with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_JIT_PROFILING to ONEDNN_ENABLE_JIT_PROFILING with value `OFF`
  -- DNNL compat: set DNNL_ENABLE_ITT_TASKS to ONEDNN_ENABLE_ITT_TASKS with value `OFF`
  -- DNNL compat: set DNNL_AARCH64_USE_ACL to ONEDNN_AARCH64_USE_ACL with value `OFF`
  -- DNNL compat: set DNNL_LIBRARY_TYPE to ONEDNN_LIBRARY_TYPE with value `STATIC`
  -- DNNL compat: set DNNL_ENABLE_WORKLOAD to ONEDNN_ENABLE_WORKLOAD with value `INFERENCE`
  -- DNNL compat: set DNNL_ENABLE_PRIMITIVE to ONEDNN_ENABLE_PRIMITIVE with value `MATMUL;REORDER`
  -- DNNL_LIBRARY_NAME: dnnl
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - found
  -- Found Threads: TRUE
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- Found Git: /usr/bin/git (found version "2.49.0")
  -- Enabled testing coverage: CI
  -- Enabled workload: INFERENCE
  -- Enabled primitives: MATMUL;REORDER
  -- Enabled primitive CPU ISA: ALL
  -- Enabled primitive GPU ISA: ALL
  -- Enabled GeMM kernels ISA: ALL
  -- Primitive cache is enabled
  -- CPU extension compile flags: -mf16c;-fopenmp;-DVLLM_CPU_EXTENSION;-mavx512f;-mavx512vl;-mavx512bw;-mavx512dq
  -- CPU extension source files: csrc/cpu/dnnl_kernels.cpp;csrc/cpu/shm.cpp;csrc/cpu/cpu_wna16.cpp;csrc/cpu/activation.cpp;csrc/cpu/utils.cpp;csrc/cpu/layernorm.cpp;csrc/cpu/mla_decode.cpp;csrc/cpu/pos_encoding.cpp;csrc/moe/dynamic_4bit_int_moe_cpu.cpp;csrc/cpu/cpu_attn.cpp;csrc/cpu/scratchpad_manager.cpp;csrc/cpu/torch_bindings.cpp
  -- Enabling C extension.
  -- Configuring done (2.8s)
  -- Generating done (0.1s)
  -- Build files have been written to: /tmp/tmp3kcxe3na.build-temp
   [1/436] Building CXX object /home/rgommers/code/tmp/pixidev-vllm/vllm/vllm/.deps/onednn-build/src/common/CMakeFiles/dnnl_common.dir/bfloat16.cpp.o
  [2/436] Building CXX object /home/rgommers/code/tmp/pixidev-vllm/vllm/vllm/.deps/onednn-build/src/common/CMakeFiles/dnnl_common.dir/dnnl_debug_autogenerated.cpp.o
...
[435/436] Building CXX object CMakeFiles/_C.dir/csrc/cpu/cpu_attn.cpp.o
  [436/436] Linking CXX shared module _C.cpython-314t-x86_64-linux-gnu.so
  -- Install configuration: "Release"
  -- Installing: /tmp/tmpm606u7y7.build-lib/vllm/_C.cpython-314t-x86_64-linux-gnu.so
  -- Set non-toolchain portion of runtime path of "/tmp/tmpm606u7y7.build-lib/vllm/_C.cpython-314t-x86_64-linux-gnu.so" to ""
  copying /tmp/tmpm606u7y7.build-lib/vllm/_C.cpython-314t-x86_64-linux-gnu.so -> vllm
  running egg_info
  creating /tmp/tmp3kcxe3na.build-temp/vllm.egg-info
  writing /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/PKG-INFO
  writing dependency_links to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/dependency_links.txt
  writing entry points to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/entry_points.txt
  writing requirements to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/requires.txt
  writing top-level names to /tmp/tmp3kcxe3na.build-temp/vllm.egg-info/top_level.txt
  writing manifest file '/tmp/tmp3kcxe3na.build-temp/vllm.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file '/tmp/tmp3kcxe3na.build-temp/vllm.egg-info/SOURCES.txt'
  Editable install will be performed using a meta path finder.

  Options like `package-data`, `include/exclude-package-data` or
  `packages.find.exclude/include` may have no effect.

  adding '__editable___vllm_0_11_2_dev102_g07a5d100d_cpu_finder.py'
  adding '__editable__.vllm-0.11.2.dev102+g07a5d100d.cpu.pth'
  creating '/tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42/tmpo_9xz7zh/.tmp-rzc6v44i/vllm-0.11.2.dev102+g07a5d100d.cpu-0.editable-cp314-cp314t-linux_x86_64.whl' and adding '/tmp/tmpzkmwfurivllm-0.11.2.dev102+g07a5d100d.cpu-0.editable-cp314-cp314t-linux_x86_64.whl' to it
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/licenses/LICENSE'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/METADATA'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/WHEEL'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/entry_points.txt'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/top_level.txt'
  adding 'vllm-0.11.2.dev102+g07a5d100d.cpu.dist-info/RECORD'
  /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/setuptools/command/editable_wheel.py:351: InformationOnly: Editable installation.
  !!

          ********************************************************************************
          Please be careful with folders in your working directory with the same
          name as your package as they may take precedence during imports.
          ********************************************************************************

  !!
    with strategy, WheelFile(wheel_path, "w") as wheel_obj:
  Building editable for vllm (pyproject.toml) ... done
  Created wheel for vllm: filename=vllm-0.11.2.dev102+g07a5d100d.cpu-0.editable-cp314-cp314t-linux_x86_64.whl size=14520 sha256=5963f6f48f960b30a49ca42ad3f856672444ad6c89664cd8e113cd593d733628
  Stored in directory: /tmp/pip-ephem-wheel-cache-_ix2supc/wheels/f4/cb/2d/262a4ed4a28c45ec26614870316e9a47ce273e9b69028c3a42
Successfully built vllm
Installing collected packages: vllm
  Attempting uninstall: vllm
    Found existing installation: vllm 0.11.2.dev102+g45888cf12.cpu
    Uninstalling vllm-0.11.2.dev102+g45888cf12.cpu:
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/vllm
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/__editable__.vllm-0.11.2.dev102+g45888cf12.cpu.pth
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/__editable___vllm_0_11_2_dev102_g45888cf12_cpu_finder.py
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/__pycache__/__editable___vllm_0_11_2_dev102_g45888cf12_cpu_finder.cpython-314.pyc
      Removing file or directory /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/lib/python3.14t/site-packages/vllm-0.11.2.dev102+g45888cf12.cpu.dist-info/
      Successfully uninstalled vllm-0.11.2.dev102+g45888cf12.cpu
  changing mode of /home/rgommers/code/tmp/pixidev-vllm/.pixi/envs/cpu/bin/vllm to 755
Successfully installed vllm-0.11.2.dev102+g07a5d100d.cpu

Note: Python 3.14 isn't yet supported by vLLM (but more interesting for free-threading), so it requires this tiny patch locally:

Patch for allowing a 3.14 interpreter
diff --git a/CMakeLists.txt b/CMakeLists.txt
index a4cf51d17..151332651 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -34,7 +34,7 @@ install(CODE "set(CMAKE_INSTALL_LOCAL_ONLY TRUE)" ALL_COMPONENTS)
 # Supported python versions.  These versions will be searched in order, the
 # first match will be selected.  These should be kept in sync with setup.py.
 #
-set(PYTHON_SUPPORTED_VERSIONS "3.10" "3.11" "3.12" "3.13")
+set(PYTHON_SUPPORTED_VERSIONS "3.10" "3.11" "3.12" "3.13" "3.14")
 
 # Supported AMD GPU architectures.
 set(HIP_SUPPORTED_ARCHS "gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1200;gfx1201;gfx1150;gfx1151")
diff --git a/pyproject.toml b/pyproject.toml
index a250ab656..69d691ae4 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -30,7 +30,7 @@ classifiers = [
     "Topic :: Scientific/Engineering :: Artificial Intelligence",
     "Topic :: Scientific/Engineering :: Information Analysis",
 ]
-requires-python = ">=3.10,<3.14"
+requires-python = ">=3.10,<3.15"
 dynamic = [ "version", "dependencies", "optional-dependencies"]

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Nov 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rgommers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to enable building vLLM with a free-threaded Python interpreter by disabling the Limited C API and Stable ABI, which are not supported in free-threaded builds. The changes correctly modify setup.py to conditionally disable py_limited_api. However, there is a critical issue in cmake/utils.cmake where the check for a free-threaded interpreter will not work as intended due to how CMake handles boolean strings, which could break existing builds. I've provided a suggestion to fix this.

Comment thread cmake/utils.cmake
For Python 3.13/3.14, free-threaded Python does not support using the
Limited C API and the Stable ABI.

Without this change, the build ends with:
```
  In file included from /path/to/vllm/csrc/core/registration.h:3,
                   from /path/to/vllm/csrc/cpu/torch_bindings.cpp:3:
  /..../include/python3.14t/Python.h:51:4: error: #error "The limited API is not currently supported in the free-threaded build"
```

Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
@rgommers
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to enable building vLLM with a free-threaded Python interpreter, a feature becoming available in Python 3.13/3.14. The author correctly identifies that the Limited C API and Stable ABI are not supported in free-threaded builds. The changes disable these features by modifying both setup.py for setuptools and cmake/utils.cmake for the CMake build process. The detection of a free-threaded interpreter is implemented correctly in both places using sysconfig.get_config_var("Py_GIL_DISABLED"). The changes are clear, well-motivated, and appear correct. I have not found any issues of high or critical severity.

Copy link
Copy Markdown
Collaborator

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

Comment thread setup.py
class CMakeExtension(Extension):
def __init__(self, name: str, cmake_lists_dir: str = ".", **kwa) -> None:
super().__init__(name, sources=[], py_limited_api=True, **kwa)
super().__init__(name, sources=[], py_limited_api=not is_freethreaded(), **kwa)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question: is py_limited_api=False required when using the free-threaded python ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @ApostaC. Yes, it should be set to False, otherwise setuptools raises an exception. The limited API is not supported yet with free-threading. There is very active work on adding that support for 3.15 (either PEP 803 or PEP 809 will add it, and both require PEP 793), but that's a new ABI which will be compatible with both free-threaded and with-GIL interpreters. Using that in the future will require both a new setuptools version and some source-level changes in vLLM to use PyModExport (PEP 793). So until that's all done, the limited API has to be avoided here.

Copy link
Copy Markdown
Collaborator

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC enabled auto-merge (squash) November 24, 2025 18:17
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 24, 2025
@rgommers
Copy link
Copy Markdown
Contributor Author

Thanks for triggering CI @DarkLight1337. I looked at the 4 failing jobs as well as the two "soft-failed" ones visible only in Buildkite - none seem related. The moe-test-1/2 ones fail with a CUDA out of memory error, test_olmoe_lora_tp4 is about sharding, and there's a lazy import failure.

I don't see the same failures in other recent PRs, but these jobs don't seem to be running in those PRs - I guess that depends on which files one touches? If there's any action I should take here, please let me know.

@DarkLight1337
Copy link
Copy Markdown
Member

The tests are also failing on main, so we can merge

@vllm-bot vllm-bot merged commit 7c1ed45 into vllm-project:main Nov 28, 2025
85 of 90 checks passed
@HSIEHCHIACHI
Copy link
Copy Markdown

Does it only support compilation on the CPU?

@HSIEHCHIACHI
Copy link
Copy Markdown

HSIEHCHIACHI commented Nov 29, 2025

error log [132/136] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_sm80.cu.o [133/136] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v1.cu.o ninja: build stopped: subcommand failed. [stderr] /home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set warnings.warn(self.message) CMake Warning at /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): Failed to compute shorthash for libnvrtc.so Call Stack (most recent call first): /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) CMakeLists.txt:91 (find_package)
  CMake Warning at
  /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:323
  (message):
    pytorch is not compatible with `CMAKE_CUDA_ARCHITECTURES` and will ignore
    its value.  Please configure `TORCH_CUDA_ARCH_LIST` instead.
  Call Stack (most recent call first):
    /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86
  (include)
    /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68
  (find_package)
    CMakeLists.txt:91 (find_package)


  CMake Warning at
  /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22
  (message):
    static library kineto_LIBRARY-NOTFOUND not found.
  Call Stack (most recent call first):
    /home/py_venv/venv-vllm/lib/python3.14t/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125
  (append_torchlib_if_found)
    CMakeLists.txt:91 (find_package)


  CMake Warning (dev) at
  /home/py_venv/venv-vllm/lib/python3.14t/site-packages/cmake/data/share/cmake-4.2/Modules/FetchContent.cmake:1963
  (message):
    Calling FetchContent_Populate(qutlass) is deprecated, call
    FetchContent_MakeAvailable(qutlass) instead.  Policy CMP0169 can be set to
    OLD to allow FetchContent_Populate(qutlass) to be called directly for now,
    but the ability to call it with declared details will be removed completely
    in a future version.
  Call Stack (most recent call first):
    cmake/external_projects/qutlass.cmake:27 (FetchContent_Populate)
    CMakeLists.txt:1044 (include)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  Traceback (most recent call last):
    File "<string>", line 11, in <module>
      wheel_filename = backend.build_editable("/home/.cache/uv/builds-v0/.tmpRtv04E", {},
  "/home/.cache/uv/builds-v0/.tmppyqWae/metadata_directory/vllm-0.1.dev11736+g6db4f98bd.d20251129.cu120.dist-info")
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/build_meta.py", line 468,
  in build_editable
      return self._build_with_temp_dir(
             ~~~~~~~~~~~~~~~~~~~~~~~~~^
          cmd, ".whl", wheel_directory, config_settings
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      )
      ^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/build_meta.py", line 404,
  in _build_with_temp_dir
      self.run_setup()
      ~~~~~~~~~~~~~~^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/build_meta.py", line 317,
  in run_setup
      exec(code, locals())
      ~~~~^^^^^^^^^^^^^^^^
    File "<string>", line 698, in <module>
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/__init__.py", line 115, in
  setup
      return distutils.core.setup(**attrs)
             ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/_distutils/core.py", line
  186, in setup
      return run_commands(dist)
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/_distutils/core.py", line
  202, in run_commands
      dist.run_commands()
      ~~~~~~~~~~~~~~~~~^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/_distutils/dist.py", line
  1002, in run_commands
      self.run_command(cmd)
      ~~~~~~~~~~~~~~~~^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/dist.py", line 1102, in
  run_command
      super().run_command(command)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/_distutils/dist.py", line
  1021, in run_command
      cmd_obj.run()
      ~~~~~~~~~~~^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/command/editable_wheel.py",
  line 139, in run
      self._create_wheel_file(bdist_wheel)
      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/command/editable_wheel.py",
  line 349, in _create_wheel_file
      files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
                       ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/command/editable_wheel.py",
  line 272, in _run_build_commands
      self._run_build_subcommands()
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/command/editable_wheel.py",
  line 299, in _run_build_subcommands
      self.run_command(name)
      ~~~~~~~~~~~~~~~~^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/_distutils/cmd.py", line 357,
  in run_command
      self.distribution.run_command(command)
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/dist.py", line 1102, in
  run_command
      super().run_command(command)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/_distutils/dist.py", line
  1021, in run_command
      cmd_obj.run()
      ~~~~~~~~~~~^^
    File "<string>", line 277, in run
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/command/build_ext.py", line
  96, in run
      _build_ext.run(self)
      ~~~~~~~~~~~~~~^^^^^^
    File "/home/py_venv/venv-vllm/lib/python3.14t/site-packages/setuptools/_distutils/command/build_ext.py",
  line 368, in run
      self.build_extensions()
      ~~~~~~~~~~~~~~~~~~~~~^^
    File "<string>", line 246, in build_extensions
    File
  "/home/.local/share/uv/python/cpython-3.14.0+freethreaded-linux-x86_64-gnu/lib/python3.14t/subprocess.py",
  line 419, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=152', '--target=_moe_C',
  '--target=cumem_allocator', '--target=triton_kernels', '--target=_vllm_fa2_C', '--target=_C']' returned
  non-zero exit status 1.
  An error occurred when building editable wheel for vllm.
  See debugging tips in: https://setuptools.pypa.io/en/latest/userguide/development_mode.html#debugging-tips

  hint: This usually indicates a problem with the package or the build environment.
An error occurs when installing paged_attention. How can I solve it?

@rgommers
Copy link
Copy Markdown
Contributor Author

Great, thanks @DarkLight1337.

@HSIEHCHIACHI for CUDA the same change is needed in flash-attention as I noted in the PR description:

Note that this same change is needed in vllm_flash_attn; PR at vllm-project/flash-attention#112. The order in which these two PRs are merged doesn't matter.

@HSIEHCHIACHI
Copy link
Copy Markdown

@rgommers Thanks. It has been solved!

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
vllm-project#29241)

Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
amd-hhashemi pushed a commit to amd-hhashemi/vllm that referenced this pull request Dec 2, 2025
vllm-project#29241)

Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
@rgommers rgommers deleted the bld-freethreading branch December 4, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants