In-Tree AMD Zen CPU Backend via zentorch [1/N] by amd-lalithnc · Pull Request #35970 · vllm-project/vllm

amd-lalithnc · 2026-03-04T06:46:35Z

Change-Id: I51b2aa53b66937c6c388b50db09072ba83ec44de

Purpose

This PR implements the foundational platform detection, GEMM dispatch, and Dockerfile targets described in RFC [RFC]: In-Tree AMD Zen CPU Backend via zentorch #35089. Fusion passes and other optimizations will follow in subsequent PRs.
Introduces ZenCpuPlatform, a new platform subclass of CpuPlatform that activates when an AMD EPYC CPU with AVX-512 is detected and zentorch is installed. Detection reads /proc/cpuinfo for AuthenticAMD + avx512 flags.
Routes all unquantized GEMM ops through zentorch::zentorch_linear_unary instead of the default oneDNN/torch linear path.
Supports eager weight prepacking via zentorch_weight_prepack_for_linear at model load time (controlled by VLLM_ZENTORCH_WEIGHT_PREPACK, default on). Eliminates per-inference layout conversion overhead.
Adds two new environment variables: VLLM_ZENTORCH_WEIGHT_PREPACK (toggle weight prepacking) and VLLM_ZENTORCH_INSTALL (toggle auto-install of zentorch during pip install vllm). Also adds pip install vllm[zen] extras entry.
Adds two Dockerfile targets for AMD Zen: vllm-openai-amd (installs zentorch from PyPI) and vllm-openai-amd-source (builds zentorch from source).
Backports a PyTorch 2.10 FxGraphCachePickler.dumps bug fix (missing ValueError catch) that causes torch.compile cache misses. Scoped to ZenCpuPlatform only and will be removed once PT 2.10 is no longer supported.
Adds is_zen_cpu() to the base Platform interface (returns False by default), overridden to True in ZenCpuPlatform.
Includes 6 unit tests: 2 for GEMM dispatch correctness (zentorch_linear_unary invocation + weight removal) and 4 for platform detection logic.

Test Plan

Test 1: Platform Detection (tests/test_zen_cpu_platform_detection.py)

Tests _is_amd_zen_cpu() from vllm.platforms with mocked /proc/cpuinfo. Four cases:

Test	Mocked cpuinfo	Expected
test_is_amd_zen_cpu_detects_amd_with_avx512	AuthenticAMD + avx512f avx512bw	True
test_is_amd_zen_cpu_returns_false_for_amd_without_avx512	AuthenticAMD + avx avx2 only	False
test_is_amd_zen_cpu_returns_false_for_intel_with_avx512	GenuineIntel + avx512f	False
test_is_amd_zen_cpu_returns_false_when_cpuinfo_missing	/proc/cpuinfo doesn't exist	False

Run with: pytest tests/test_zen_cpu_platform_detection.py -v

Test 2: Op Dispatch (tests/model_executor/test_cpu_unquantized_gemm_dispatch.py)

Tests dispatch_cpu_unquantized_gemm() from vllm.model_executor.layers.utils. Registers a mock zentorch_linear_unary op (via torch.library) if real zentorch isn't installed, then monkeypatches current_platform.is_zen_cpu to return True.

Two cases:

Test	What it checks
test_dispatch_cpu_unquantized_gemm_uses_zentorch_on_zen	After dispatch, `layer.cpu_linear(x, weight, bias)` produces the same output as `F.linear(x, weight, bias)`
test_dispatch_cpu_unquantized_gemm_zen_remove_weight	With `remove_weight=True`, `layer.weight.numel() == 0` after dispatch

Run with: pytest tests/model_executor/test_cpu_unquantized_gemm_dispatch.py -v

Test Result

Tested on an EPYC Zen4 server

test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_detects_amd_with_avx512 PASSED
test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_for_amd_without_avx512 PASSED
test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_for_intel_with_avx512 PASSED
test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_when_cpuinfo_missing PASSED
test_cpu_unquantized_gemm_dispatch.py::test_dispatch_cpu_unquantized_gemm_uses_zentorch_on_zen PASSED
test_cpu_unquantized_gemm_dispatch.py::test_dispatch_cpu_unquantized_gemm_zen_remove_weight PASSED

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces support for AMD Zen CPUs via zentorch, including new Docker build targets, platform detection logic, and GEMM dispatching through zentorch. The changes are well-structured and include comprehensive tests. My review focuses on a few key areas for improvement: reducing Docker image size by cleaning up build dependencies, eliminating code duplication between setup.py and the runtime platform detection, and correcting the configuration for torch.compile's caching mechanism to prevent potential cache corruption. These changes will improve maintainability and ensure correctness.

vllm/envs.py

docker/Dockerfile.cpu

setup.py

mergify · 2026-03-04T06:51:15Z

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com> Change-Id: I51b2aa53b66937c6c388b50db09072ba83ec44de

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I223526ad52e11a94787f1e7770a63924ec2e8bb1

mergify · 2026-03-04T07:03:53Z

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I09b760c50c62dbafcc910e46664754b56a8dbd57

amd-lalithnc · 2026-03-04T08:15:44Z

cc: @tlrmchlsmth @ProExpertProg

tlrmchlsmth

I think we can simplify the selection of the zentorch dependency

My suggestion would be to drop the auto-detection from setup.py, removing the _is_amd_zen_cpu() import, the VLLM_ZENTORCH_INSTALL env var, and the conditional requirements.append("zentorch"), and keep only the "zen" extra. This cleans up logic across setup.py and envs.py, removes a compile factor that shouldn't exist, and makes the dependency management explicit and predictable.

tlrmchlsmth · 2026-03-11T14:03:51Z

docker/Dockerfile.cpu

+######################### AMD SOURCE BUILD IMAGE #########################
+FROM vllm-openai AS vllm-openai-amd-source


What is this needed for? Let's remove it to keep the upstream build simple

removed the PyPI based target as the build from source is a customer requirement.

How long does it take to build from source?

Let's install zentorch from pypi with this image in order to reduce resources used in vLLM's CI and to make it quicker and easier for folks to build the image. We don't need to build zentorch from source in upstream vLLM, and like I said, better to keep it simpler and easier.

hi @tlrmchlsmth the source build takes a few minutes (it will depend on the machine config on which the build is running though). Overall the build time is << vllm from source build time. Also going forward when we support running the vLLM + zentorch CI on supported AMD CPU instances we would like to have the dockerfile support this target. Should we also try this with the pypi based flow or is a source based build ok for this?

hi @tlrmchlsmth we have removed the source build in the latest update. We are only doing a pypi build in the docker as well. We can bring back the source build in future PRs if there are specific requests for it

tlrmchlsmth · 2026-03-11T14:21:48Z

vllm/model_executor/layers/utils.py

+        layer.cpu_linear = (
+            lambda x, weight, bias: torch.ops.zentorch.zentorch_linear_unary(
+                x, zen_weight, bias, is_weight_prepacked=is_prepacked
+            )
+        )


This captures zen_weight and is_prepacked by reference, but these are reassigned in the enclosing scope. This looks like it does the right thing for now but is is fragile. Let's capture by value

Suggested change

layer.cpu_linear = (

lambda x, weight, bias: torch.ops.zentorch.zentorch_linear_unary(

x, zen_weight, bias, is_weight_prepacked=is_prepacked

)

)

layer.cpu_linear = (

lambda x, weight, bias, _w=zen_weight, _p=is_prepacked:

torch.ops.zentorch.zentorch_linear_unary(x, _w, bias, is_weight_prepacked=_p)

)

hi @tlrmchlsmth we had tried this before submitting the PR but this is failing in the aot_compile flow of PyTorch due to deviation from the op schema. The AOTAutogradCache expects the op schema to be maintained (x, weight, bias, is_weight_prepacked) whereas with the additional argument to pass the modified weight by value it fails with the following error
TypeError: DefaultsSource.__init__() takes from 3 to 4 positional arguments but 6 were given
Another related change to only pass the weight (and as a result the bias and the is_weight_prepacked) by value resulted in a FX graph propagation error with Dynamo (most likely due to the schema mismatch again)
I am guessing this is also why the VLLM_CPU_SGL_KERNEL path is currently passing the packed weight and bias by reference

ah, OK then. Thanks for the explanation.

cc @ProExpertProg in case you have any ideas (but we should handle any improvements in a future PR)

I think getting a more minimal torch repro would be helpful, not sure I understand exactly what's going on

Can we add a vllm issue link here? so the torch team can investigate

tlrmchlsmth · 2026-03-11T14:26:45Z

vllm/platforms/zen_cpu.py

+        cls._patch_fxgraphcache_pickle()
+
+    @classmethod
+    def _patch_fxgraphcache_pickle(cls):
+        """Backport mainline ValueError fix to FxGraphCachePickler.dumps()."""
+        import pickle
+
+        from torch._inductor.codecache import FxGraphCachePickler
+
+        original_dumps = FxGraphCachePickler.dumps
+        if hasattr(original_dumps, "_zen_patched"):
+            return
+
+        def patched_dumps_method(self, obj):
+            import logging as _logging
+
+            from torch._inductor.codecache import BypassFxGraphCache
+
+            _logger = _logging.getLogger("torch._inductor.codecache")
+            try:
+                self.dump(obj)
+                return self._stream.getvalue()
+            except (TypeError, AttributeError, pickle.PicklingError, ValueError) as e:
+                _logger.warning("Failed to pickle cache key", exc_info=True)
+                raise BypassFxGraphCache("Failed to pickle cache key") from e
+            finally:
+                self._stream.seek(0)
+                self._stream.truncate(0)


Patching like this to replace the function definition is brittle. IMO it would be better (less prone to future breakage) to wrap the function like this

Suggested change

cls._patch_fxgraphcache_pickle()

@classmethod

def _patch_fxgraphcache_pickle(cls):

"""Backport mainline ValueError fix to FxGraphCachePickler.dumps()."""

import pickle

from torch._inductor.codecache import FxGraphCachePickler

original_dumps = FxGraphCachePickler.dumps

if hasattr(original_dumps, "_zen_patched"):

return

def patched_dumps_method(self, obj):

import logging as _logging

from torch._inductor.codecache import BypassFxGraphCache

_logger = _logging.getLogger("torch._inductor.codecache")

try:

self.dump(obj)

return self._stream.getvalue()

except (TypeError, AttributeError, pickle.PicklingError, ValueError) as e:

_logger.warning("Failed to pickle cache key", exc_info=True)

raise BypassFxGraphCache("Failed to pickle cache key") from e

finally:

self._stream.seek(0)

self._stream.truncate(0)

@classmethod

def _patch_fxgraphcache_pickle(cls):

from torch._inductor.codecache import BypassFxGraphCache, FxGraphCachePickler

original_dumps = FxGraphCachePickler.dumps

if hasattr(original_dumps, "_zen_patched"):

return

def patched_dumps(self, obj):

try:

return original_dumps(self, obj)

except ValueError as e:

raise BypassFxGraphCache("Failed to pickle cache key") from e

patched_dumps._zen_patched = True

FxGraphCachePickler.dumps = patched_dumps

Why do we need to patch inductor anyway?

When VLLM_ZENTORCH_WEIGHT_PREPACK=1 (default), zentorch prepacks weights into ZenDNN blocked-layout tensors at load time. These tensors raise ValueError when FxGraphCachePickler.dumps() tries to serialize the graph cache key, because PyTorch 2.10's dumps() doesn't catch ValueError.

PyTorch mainline already fixed this in pytorch/pytorch
#176557 (merged 2026-03-04) by adding ValueError to the except clause. Our patch is a thin backport of that fix for 2.10 users -- scoped to 2.10 <= version < 2.11 and will be removed once we drop 2.10 support.

Can we move this to the env_override.py file with other pytorch patches?

Seems like a good fix either way, no need to be zen-only

vllm/platforms/zen_cpu.py

tests/model_executor/test_cpu_unquantized_gemm_dispatch.py

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I3e294003184afa678e3ece0e97ec0561fc619290

tlrmchlsmth · 2026-03-12T16:40:48Z

vllm/envs.py

+    # (Zen CPU backend) eagerly prepack weights into ZenDNN blocked layout
+    # at model load time. Eliminates per-inference layout conversion overhead.
+    "VLLM_ZENTORCH_WEIGHT_PREPACK": lambda: bool(
+        int(os.getenv("VLLM_ZENTORCH_WEIGHT_PREPACK", "1"))
+    ),


We're trying to keep the number of environment variables minimal in vLLM.

Why wouldn't someone want to pre-pack the weights? If there's no compelling reason, could we remove the env?

hi @tlrmchlsmth since this environment variable is enabled by default it should be transparent for most users. Currently, this is mostly a debug feature for enabling other kernel variants from the zendnn library backend

vllm/platforms/zen_cpu.py

mergify · 2026-03-12T16:52:00Z

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I943a52eae3dfaefe33130c5869af47d7f89deb54

mergify · 2026-03-12T18:28:53Z

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: Ic12ec89133419214aa618d27cc855db4cefb8a01

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I13513045dd88b93a5f68ac72e0bf6507fa1bea85

tlrmchlsmth · 2026-03-13T16:44:18Z

docker/Dockerfile.cpu

+
+
+######################### AMD PYPI IMAGE #########################
+FROM vllm-openai AS vllm-openai-amd


Do we like this layer name, or would it be better to be more specific and call it vllm-openai-zen so that it's clearly not related to AMD GPUs or to the amd64 isa?

tlrmchlsmth

This looks good to me now, thank you!

ProExpertProg

Just nits

vllm/platforms/zen_cpu.py

- Use is_torch_equal_or_newer instead of TorchVersion for version check - Add comment to is_zen_cpu() that is_cpu is also True - Remove import torch (no longer needed) - Remove zen_cpu.py pickle exemption (no longer imports pickle) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

The name vllm-openai-amd was ambiguous — it could be mistaken for AMD GPU (ROCm) support or the amd64 ISA. vllm-openai-zen is more specific and consistent with the [zen] extra and ZenCpuPlatform. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

amd-lalithnc requested a review from bigPYJ1151 as a code owner March 4, 2026 06:46

mergify bot added ci/build rocm Related to AMD ROCm cpu Related to CPU backends labels Mar 4, 2026

github-project-automation bot added this to AMD Mar 4, 2026

github-project-automation bot moved this to Todo in AMD Mar 4, 2026

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

vllm/envs.py Outdated Show resolved Hide resolved

docker/Dockerfile.cpu Outdated Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

zentorch platform upstreaming

a23c37c

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com> Change-Id: I51b2aa53b66937c6c388b50db09072ba83ec44de

amd-lalithnc force-pushed the zentorch-upstreaming branch from 6b0a20a to a23c37c Compare March 4, 2026 06:55

Address Review Comments

ad93740

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I223526ad52e11a94787f1e7770a63924ec2e8bb1

fix pre-commit failures

f207ab7

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I09b760c50c62dbafcc910e46664754b56a8dbd57

amd-lalithnc requested a review from hmellor as a code owner March 4, 2026 07:20

tlrmchlsmth self-assigned this Mar 4, 2026

tlrmchlsmth reviewed Mar 11, 2026

View reviewed changes

Address review comments

f6c2a80

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I3e294003184afa678e3ece0e97ec0561fc619290

tlrmchlsmth reviewed Mar 12, 2026

View reviewed changes

Address review comments

d73c2b1

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I943a52eae3dfaefe33130c5869af47d7f89deb54

amd-lalithnc added 2 commits March 12, 2026 22:42

Fix Linter Issues

65e8a56

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: Ic12ec89133419214aa618d27cc855db4cefb8a01

support pypi based dockerfile build with zentorch

9b77dfc

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I13513045dd88b93a5f68ac72e0bf6507fa1bea85

amd-lalithnc force-pushed the zentorch-upstreaming branch from d7408e7 to 9b77dfc Compare March 13, 2026 14:33

tlrmchlsmth reviewed Mar 13, 2026

View reviewed changes

tlrmchlsmth approved these changes Mar 13, 2026

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2026

Merge branch 'main' into zentorch-upstreaming

43d1664

ProExpertProg approved these changes Mar 15, 2026

View reviewed changes

vllm/platforms/zen_cpu.py Outdated Show resolved Hide resolved

vllm/platforms/zen_cpu.py Show resolved Hide resolved

tlrmchlsmth and others added 3 commits March 15, 2026 21:29

Merge branch 'main' into zentorch-upstreaming

60f0c74

tlrmchlsmth enabled auto-merge (squash) March 15, 2026 21:33

tlrmchlsmth merged commit 7acaea6 into vllm-project:main Mar 15, 2026
128 of 129 checks passed

github-project-automation bot moved this from Todo to Done in AMD Mar 15, 2026

		######################### AMD SOURCE BUILD IMAGE #########################
		FROM vllm-openai AS vllm-openai-amd-source



		######################### AMD PYPI IMAGE #########################
		FROM vllm-openai AS vllm-openai-amd

Uh oh!

Conversation

amd-lalithnc commented Mar 4, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test 1: Platform Detection (tests/test_zen_cpu_platform_detection.py)

Test 2: Op Dispatch (tests/model_executor/test_cpu_unquantized_gemm_dispatch.py)

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 4, 2026

Uh oh!

mergify bot commented Mar 4, 2026

Uh oh!

amd-lalithnc commented Mar 4, 2026

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amukho Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amd-lalithnc commented Mar 4, 2026 •

edited by github-actions bot

Loading

amukho Mar 12, 2026 •

edited

Loading