Add NotImplementedError to v1 cpu runner by fred2167 · Pull Request #19527 · vllm-project/vllm

fred2167 · 2025-06-12T03:35:20Z

add v1 cpu runner not implemented error

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This is my first commit to the repo. The purpose is to familiarize the codebase with minimal changes.

This PR is only aim to make the error more verbose while running the example on a CPU machine so that it doesnt fall to the GPU implementation

VLLM_USE_V1=1 python examples/offline_inference/basic/basic.py

Test Plan

Unit test and example

Test Result

pytest tests/v1/worker/test_cpu_model_runner.py

=========================================================================== warnings summary ============================================================================
../../.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /Users/fredchan/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================================== 1 passed, 1 warning in 12.78s =====================================================================

add v1 cpu runner not implemented error

gemini-code-assist · 2025-06-12T03:35:23Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

github-actions · 2025-06-12T03:35:29Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

bigPYJ1151 · 2025-06-12T03:45:27Z

Any issue when you using the CPU V1?

The CPU V1 model runner inherits from the GPU V1 model runner and most member functions can be reused directly. So it's not required to throw unimplemented errors.

Just checked python examples/offline_inference/basic/basic.py and it worked with the V1 engine by default.

Signed-off-by: Fred Chan <fred2167@gmail.com>

houseroad · 2025-06-12T06:44:08Z

yeah, I am also a bit confused, since CPU backend also works on macos CPU side. :-)

fred2167 · 2025-06-12T06:49:20Z

Any issue when you using the CPU V1?

Im running on M1 Mac so I guess I dont have this package from intel in SDAP backend. any recommendation for Mac?

The CPU V1 model runner inherits from the GPU V1 model runner and most member functions can be reused directly. So it's not required to throw unimplemented errors.

does the intel package also support GPU? if not, is it better to move cpu specific logic to the cpu model runner?

Just checked python examples/offline_inference/basic/basic.py and it worked with the V1 engine by default.

this is the output Im getting by running

 VLLM_USE_V1=1 python examples/offline_inference/basic/basic.py

ERROR 06-12 14:45:32 [core.py:517]   File "/Users/fredchan/code/original-vllm/vllm/attention/layer.py", line 402, in unified_attention
ERROR 06-12 14:45:32 [core.py:517]     output = self.impl.forward(self, query, key, value, kv_cache,
ERROR 06-12 14:45:32 [core.py:517]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-12 14:45:32 [core.py:517]   File "/Users/fredchan/code/original-vllm/vllm/attention/backends/torch_sdpa.py", line 559, in forward
ERROR 06-12 14:45:32 [core.py:517]     import intel_extension_for_pytorch.llm.modules as ipex_modules
ERROR 06-12 14:45:32 [core.py:517] ModuleNotFoundError: No module named 'intel_extension_for_pytorch'

fred2167 · 2025-06-12T06:53:39Z

yeah, I am also a bit confused, since CPU backend also works on macos CPU side. :-)

interesting, are you able to run the example on Mac?

bigPYJ1151 · 2025-06-12T07:06:49Z

@fred2167 The V1 engine requires chunked-prefill support, which has not been supported on macos. For CPU, only x86 supports this via intel_extension_for_pytorch package. By default the V0 engine will be used for macos.

CUDA backend will not use SDAP backend, it is only used by CPU.

houseroad · 2025-06-12T07:07:09Z

https://gist.github.com/houseroad/9fc43ba08c192c7c91914f2f1af539fb, I tried something like this yesterday :-)

fred2167 · 2025-06-12T07:17:40Z

@fred2167 The V1 engine requires chunked-prefill support, which has not been supported on macos. For CPU, only x86 supports this via intel_extension_for_pytorch package. By default the V0 engine will be used for macos.

CUDA backend will not use SDAP backend, it is only used by CPU.

make sense. I dont think its a small lift for v1 to support Mac. maybe worth updating the doc to reflect this given someone have similar issue on slack

fred2167 · 2025-06-12T07:22:37Z

https://gist.github.com/houseroad/9fc43ba08c192c7c91914f2f1af539fb, I tried something like this yesterday :-)

is this on V0 or V1 engine

houseroad · 2025-06-12T07:29:39Z

V0 works. V1 failed.

houseroad · 2025-06-12T07:34:59Z

I thought my work was done on v1, but actually it was on v0. Then this PR may make sense.

hmellor · 2026-03-04T17:42:30Z

The underlying issue was addressed by #19121

add .envrc to gitignore

609d77c

add v1 cpu runner not implemented error

mergify bot added the v1 label Jun 12, 2025

fred2167 force-pushed the v1-cpu-not-implemented branch from 609d77c to feb00d5 Compare June 12, 2025 03:41

linter, scheudler output

7cfc9e1

Signed-off-by: Fred Chan <fred2167@gmail.com>

fred2167 force-pushed the v1-cpu-not-implemented branch from feb00d5 to 7cfc9e1 Compare June 12, 2025 03:54

hmellor closed this Mar 4, 2026

mergify bot added the cpu Related to CPU backends label Mar 4, 2026

Uh oh!

Conversation

fred2167 commented Jun 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

bigPYJ1151 commented Jun 12, 2025

Uh oh!

houseroad commented Jun 12, 2025

Uh oh!

fred2167 commented Jun 12, 2025

Uh oh!

fred2167 commented Jun 12, 2025

Uh oh!

bigPYJ1151 commented Jun 12, 2025

Uh oh!

houseroad commented Jun 12, 2025

Uh oh!

fred2167 commented Jun 12, 2025

Uh oh!

fred2167 commented Jun 12, 2025

Uh oh!

houseroad commented Jun 12, 2025

Uh oh!

houseroad commented Jun 12, 2025

Uh oh!

hmellor commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fred2167 commented Jun 12, 2025 •

edited by github-actions bot

Loading