Skip to content

[CPU] Support head sizes 80 and 112 with vec16 fallback#251

Merged
moulalis merged 1 commit intomainfrom
cpu-atten-main
Jan 15, 2026
Merged

[CPU] Support head sizes 80 and 112 with vec16 fallback#251
moulalis merged 1 commit intomainfrom
cpu-atten-main

Conversation

@Meghagaur
Copy link
Contributor

Purpose
Reintroduce support for head dimensions 80 and 112 in CPU attention backend which were previously removed in vllm-project/vllm#27954 but these head dimensions are commonly used by granite models deployed on Z archs. Since these heads are not friendly for Intel AMX instruction set. The implementation now falls back to vec16.

Test Plan
Build Docker image and test using ibm-granite/granite-3b-code-base-2k model which has head size of 80.

upstream PR - vllm-project/vllm#31968

@Meghagaur Meghagaur changed the title cpu-atten-fix [CPU] Support head sizes 80 and 112 with vec16 fallback Jan 14, 2026
@Meghagaur
Copy link
Contributor Author

/build-konflux

@Meghagaur
Copy link
Contributor Author

Meghagaur commented Jan 15, 2026

Hi @wznoinsk
Could you please review this PR ? The Konflux build has passed.
This is required for vLLM CPU to work correctly on RHOAI 3.2.
Thank you

@Meghagaur Meghagaur requested a review from wznoinsk January 15, 2026 04:43
@Meghagaur Meghagaur mentioned this pull request Jan 15, 2026
@moulalis moulalis merged commit cf334b6 into main Jan 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants