Skip to content

In-Tree AMD Zen CPU Backend via zentorch [1/N]#35970

Merged
tlrmchlsmth merged 11 commits intovllm-project:mainfrom
amd-lalithnc:zentorch-upstreaming
Mar 15, 2026
Merged

In-Tree AMD Zen CPU Backend via zentorch [1/N]#35970
tlrmchlsmth merged 11 commits intovllm-project:mainfrom
amd-lalithnc:zentorch-upstreaming

Conversation

@amd-lalithnc
Copy link
Contributor

@amd-lalithnc amd-lalithnc commented Mar 4, 2026

Change-Id: I51b2aa53b66937c6c388b50db09072ba83ec44de

Purpose

  • This PR implements the foundational platform detection, GEMM dispatch, and Dockerfile targets described in RFC [RFC]: In-Tree AMD Zen CPU Backend via zentorch #35089. Fusion passes and other optimizations will follow in subsequent PRs.
  • Introduces ZenCpuPlatform, a new platform subclass of CpuPlatform that activates when an AMD EPYC CPU with AVX-512 is detected and zentorch is installed. Detection reads /proc/cpuinfo for AuthenticAMD + avx512 flags.
  • Routes all unquantized GEMM ops through zentorch::zentorch_linear_unary instead of the default oneDNN/torch linear path.
  • Supports eager weight prepacking via zentorch_weight_prepack_for_linear at model load time (controlled by VLLM_ZENTORCH_WEIGHT_PREPACK, default on). Eliminates per-inference layout conversion overhead.
  • Adds two new environment variables: VLLM_ZENTORCH_WEIGHT_PREPACK (toggle weight prepacking) and VLLM_ZENTORCH_INSTALL (toggle auto-install of zentorch during pip install vllm). Also adds pip install vllm[zen] extras entry.
  • Adds two Dockerfile targets for AMD Zen: vllm-openai-amd (installs zentorch from PyPI) and vllm-openai-amd-source (builds zentorch from source).
  • Backports a PyTorch 2.10 FxGraphCachePickler.dumps bug fix (missing ValueError catch) that causes torch.compile cache misses. Scoped to ZenCpuPlatform only and will be removed once PT 2.10 is no longer supported.
  • Adds is_zen_cpu() to the base Platform interface (returns False by default), overridden to True in ZenCpuPlatform.
  • Includes 6 unit tests: 2 for GEMM dispatch correctness (zentorch_linear_unary invocation + weight removal) and 4 for platform detection logic.

Test Plan

Test 1: Platform Detection (tests/test_zen_cpu_platform_detection.py)

Tests _is_amd_zen_cpu() from vllm.platforms with mocked /proc/cpuinfo. Four cases:

Test Mocked cpuinfo Expected
test_is_amd_zen_cpu_detects_amd_with_avx512 AuthenticAMD + avx512f avx512bw True
test_is_amd_zen_cpu_returns_false_for_amd_without_avx512 AuthenticAMD + avx avx2 only False
test_is_amd_zen_cpu_returns_false_for_intel_with_avx512 GenuineIntel + avx512f False
test_is_amd_zen_cpu_returns_false_when_cpuinfo_missing /proc/cpuinfo doesn't exist False

Run with: pytest tests/test_zen_cpu_platform_detection.py -v


Test 2: Op Dispatch (tests/model_executor/test_cpu_unquantized_gemm_dispatch.py)

Tests dispatch_cpu_unquantized_gemm() from vllm.model_executor.layers.utils. Registers a mock zentorch_linear_unary op (via torch.library) if real zentorch isn't installed, then monkeypatches current_platform.is_zen_cpu to return True.

Two cases:

Test What it checks
test_dispatch_cpu_unquantized_gemm_uses_zentorch_on_zen After dispatch, layer.cpu_linear(x, weight, bias) produces the same output as F.linear(x, weight, bias)
test_dispatch_cpu_unquantized_gemm_zen_remove_weight With remove_weight=True, layer.weight.numel() == 0 after dispatch

Run with: pytest tests/model_executor/test_cpu_unquantized_gemm_dispatch.py -v

Test Result

Tested on an EPYC Zen4 server

test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_detects_amd_with_avx512 PASSED
test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_for_amd_without_avx512 PASSED
test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_for_intel_with_avx512 PASSED
test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_when_cpuinfo_missing PASSED
test_cpu_unquantized_gemm_dispatch.py::test_dispatch_cpu_unquantized_gemm_uses_zentorch_on_zen PASSED
test_cpu_unquantized_gemm_dispatch.py::test_dispatch_cpu_unquantized_gemm_zen_remove_weight PASSED


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@amd-lalithnc amd-lalithnc requested a review from bigPYJ1151 as a code owner March 4, 2026 06:46
@mergify mergify bot added ci/build rocm Related to AMD ROCm cpu Related to CPU backends labels Mar 4, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 4, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for AMD Zen CPUs via zentorch, including new Docker build targets, platform detection logic, and GEMM dispatching through zentorch. The changes are well-structured and include comprehensive tests. My review focuses on a few key areas for improvement: reducing Docker image size by cleaning up build dependencies, eliminating code duplication between setup.py and the runtime platform detection, and correcting the configuration for torch.compile's caching mechanism to prevent potential cache corruption. These changes will improve maintainability and ensure correctness.

@mergify
Copy link

mergify bot commented Mar 4, 2026

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Change-Id: I51b2aa53b66937c6c388b50db09072ba83ec44de
@amd-lalithnc amd-lalithnc force-pushed the zentorch-upstreaming branch from 6b0a20a to a23c37c Compare March 4, 2026 06:55
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Change-Id: I223526ad52e11a94787f1e7770a63924ec2e8bb1
@mergify
Copy link

mergify bot commented Mar 4, 2026

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Change-Id: I09b760c50c62dbafcc910e46664754b56a8dbd57
@amd-lalithnc amd-lalithnc requested a review from hmellor as a code owner March 4, 2026 07:20
@amd-lalithnc
Copy link
Contributor Author

cc: @tlrmchlsmth @ProExpertProg

@tlrmchlsmth tlrmchlsmth self-assigned this Mar 4, 2026
Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify the selection of the zentorch dependency

My suggestion would be to drop the auto-detection from setup.py, removing the _is_amd_zen_cpu() import, the VLLM_ZENTORCH_INSTALL env var, and the conditional requirements.append("zentorch"), and keep only the "zen" extra. This cleans up logic across setup.py and envs.py, removes a compile factor that shouldn't exist, and makes the dependency management explicit and predictable.

Comment on lines +274 to +275
######################### AMD SOURCE BUILD IMAGE #########################
FROM vllm-openai AS vllm-openai-amd-source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this needed for? Let's remove it to keep the upstream build simple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the PyPI based target as the build from source is a customer requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does it take to build from source?

Let's install zentorch from pypi with this image in order to reduce resources used in vLLM's CI and to make it quicker and easier for folks to build the image. We don't need to build zentorch from source in upstream vLLM, and like I said, better to keep it simpler and easier.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tlrmchlsmth the source build takes a few minutes (it will depend on the machine config on which the build is running though). Overall the build time is << vllm from source build time. Also going forward when we support running the vLLM + zentorch CI on supported AMD CPU instances we would like to have the dockerfile support this target. Should we also try this with the pypi based flow or is a source based build ok for this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tlrmchlsmth we have removed the source build in the latest update. We are only doing a pypi build in the docker as well. We can bring back the source build in future PRs if there are specific requests for it

Comment on lines +247 to +251
layer.cpu_linear = (
lambda x, weight, bias: torch.ops.zentorch.zentorch_linear_unary(
x, zen_weight, bias, is_weight_prepacked=is_prepacked
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This captures zen_weight and is_prepacked by reference, but these are reassigned in the enclosing scope. This looks like it does the right thing for now but is is fragile. Let's capture by value

Suggested change
layer.cpu_linear = (
lambda x, weight, bias: torch.ops.zentorch.zentorch_linear_unary(
x, zen_weight, bias, is_weight_prepacked=is_prepacked
)
)
layer.cpu_linear = (
lambda x, weight, bias, _w=zen_weight, _p=is_prepacked:
torch.ops.zentorch.zentorch_linear_unary(x, _w, bias, is_weight_prepacked=_p)
)

Copy link

@amukho amukho Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tlrmchlsmth we had tried this before submitting the PR but this is failing in the aot_compile flow of PyTorch due to deviation from the op schema. The AOTAutogradCache expects the op schema to be maintained (x, weight, bias, is_weight_prepacked) whereas with the additional argument to pass the modified weight by value it fails with the following error
TypeError: DefaultsSource.__init__() takes from 3 to 4 positional arguments but 6 were given
Another related change to only pass the weight (and as a result the bias and the is_weight_prepacked) by value resulted in a FX graph propagation error with Dynamo (most likely due to the schema mismatch again)
I am guessing this is also why the VLLM_CPU_SGL_KERNEL path is currently passing the packed weight and bias by reference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, OK then. Thanks for the explanation.

cc @ProExpertProg in case you have any ideas (but we should handle any improvements in a future PR)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting a more minimal torch repro would be helpful, not sure I understand exactly what's going on

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a vllm issue link here? so the torch team can investigate

Comment on lines +50 to +77
cls._patch_fxgraphcache_pickle()

@classmethod
def _patch_fxgraphcache_pickle(cls):
"""Backport mainline ValueError fix to FxGraphCachePickler.dumps()."""
import pickle

from torch._inductor.codecache import FxGraphCachePickler

original_dumps = FxGraphCachePickler.dumps
if hasattr(original_dumps, "_zen_patched"):
return

def patched_dumps_method(self, obj):
import logging as _logging

from torch._inductor.codecache import BypassFxGraphCache

_logger = _logging.getLogger("torch._inductor.codecache")
try:
self.dump(obj)
return self._stream.getvalue()
except (TypeError, AttributeError, pickle.PicklingError, ValueError) as e:
_logger.warning("Failed to pickle cache key", exc_info=True)
raise BypassFxGraphCache("Failed to pickle cache key") from e
finally:
self._stream.seek(0)
self._stream.truncate(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patching like this to replace the function definition is brittle. IMO it would be better (less prone to future breakage) to wrap the function like this

Suggested change
cls._patch_fxgraphcache_pickle()
@classmethod
def _patch_fxgraphcache_pickle(cls):
"""Backport mainline ValueError fix to FxGraphCachePickler.dumps()."""
import pickle
from torch._inductor.codecache import FxGraphCachePickler
original_dumps = FxGraphCachePickler.dumps
if hasattr(original_dumps, "_zen_patched"):
return
def patched_dumps_method(self, obj):
import logging as _logging
from torch._inductor.codecache import BypassFxGraphCache
_logger = _logging.getLogger("torch._inductor.codecache")
try:
self.dump(obj)
return self._stream.getvalue()
except (TypeError, AttributeError, pickle.PicklingError, ValueError) as e:
_logger.warning("Failed to pickle cache key", exc_info=True)
raise BypassFxGraphCache("Failed to pickle cache key") from e
finally:
self._stream.seek(0)
self._stream.truncate(0)
@classmethod
def _patch_fxgraphcache_pickle(cls):
from torch._inductor.codecache import BypassFxGraphCache, FxGraphCachePickler
original_dumps = FxGraphCachePickler.dumps
if hasattr(original_dumps, "_zen_patched"):
return
def patched_dumps(self, obj):
try:
return original_dumps(self, obj)
except ValueError as e:
raise BypassFxGraphCache("Failed to pickle cache key") from e
patched_dumps._zen_patched = True
FxGraphCachePickler.dumps = patched_dumps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to patch inductor anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When VLLM_ZENTORCH_WEIGHT_PREPACK=1 (default), zentorch prepacks weights into ZenDNN blocked-layout tensors at load time. These tensors raise ValueError when FxGraphCachePickler.dumps() tries to serialize the graph cache key, because PyTorch 2.10's dumps() doesn't catch ValueError.

PyTorch mainline already fixed this in pytorch/pytorch
#176557
(merged 2026-03-04) by adding ValueError to the except clause. Our patch is a thin backport of that fix for 2.10 users -- scoped to 2.10 <= version < 2.11 and will be removed once we drop 2.10 support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the env_override.py file with other pytorch patches?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good fix either way, no need to be zen-only

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Change-Id: I3e294003184afa678e3ece0e97ec0561fc619290
Comment on lines +713 to +717
# (Zen CPU backend) eagerly prepack weights into ZenDNN blocked layout
# at model load time. Eliminates per-inference layout conversion overhead.
"VLLM_ZENTORCH_WEIGHT_PREPACK": lambda: bool(
int(os.getenv("VLLM_ZENTORCH_WEIGHT_PREPACK", "1"))
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to keep the number of environment variables minimal in vLLM.

Why wouldn't someone want to pre-pack the weights? If there's no compelling reason, could we remove the env?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tlrmchlsmth since this environment variable is enabled by default it should be transparent for most users. Currently, this is mostly a debug feature for enabling other kernel variants from the zendnn library backend

@mergify
Copy link

mergify bot commented Mar 12, 2026

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Change-Id: I943a52eae3dfaefe33130c5869af47d7f89deb54
@mergify
Copy link

mergify bot commented Mar 12, 2026

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Change-Id: Ic12ec89133419214aa618d27cc855db4cefb8a01
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Change-Id: I13513045dd88b93a5f68ac72e0bf6507fa1bea85
@amd-lalithnc amd-lalithnc force-pushed the zentorch-upstreaming branch from d7408e7 to 9b77dfc Compare March 13, 2026 14:33


######################### AMD PYPI IMAGE #########################
FROM vllm-openai AS vllm-openai-amd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we like this layer name, or would it be better to be more specific and call it vllm-openai-zen so that it's clearly not related to AMD GPUs or to the amd64 isa?

Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, thank you!

@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2026
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits

tlrmchlsmth and others added 3 commits March 15, 2026 21:29
- Use is_torch_equal_or_newer instead of TorchVersion for version check
- Add comment to is_zen_cpu() that is_cpu is also True
- Remove import torch (no longer needed)
- Remove zen_cpu.py pickle exemption (no longer imports pickle)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
The name vllm-openai-amd was ambiguous — it could be mistaken for
AMD GPU (ROCm) support or the amd64 ISA. vllm-openai-zen is more
specific and consistent with the [zen] extra and ZenCpuPlatform.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) March 15, 2026 21:33
@tlrmchlsmth tlrmchlsmth merged commit 7acaea6 into vllm-project:main Mar 15, 2026
128 of 129 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Mar 15, 2026
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants