Skip to content

vllm-cpu attention fix#250

Merged
rroshan-rh merged 1 commit intorhoai-3.2from
cpu-atten-s390x
Jan 15, 2026
Merged

vllm-cpu attention fix#250
rroshan-rh merged 1 commit intorhoai-3.2from
cpu-atten-s390x

Conversation

@Meghagaur
Copy link
Contributor

@Meghagaur Meghagaur commented Jan 14, 2026

Reintroduce support for head dimensions 80 and 112 in CPU attention backend which were previously removed in vllm-project/vllm#27954 but these head dimensions are commonly used by granite models deployed on Z archs. Since these heads are not friendly for Intel AMX instruction set. The implementation now falls back to vec16.

Test Plan
Build Docker image and test using ibm-granite/granite-3b-code-base-2k model which has head size of 80.

Upstream PR - vllm-project/vllm#31968
Main branch PR - #251

@Meghagaur
Copy link
Contributor Author

/build-konflux

@Meghagaur
Copy link
Contributor Author

Meghagaur commented Jan 15, 2026

Hi @wznoinsk,
Could you please review this PR ? The Konflux build has passed.
This is required for vLLM CPU to work correctly on RHOAI 3.2.
Thank you

@Meghagaur Meghagaur requested a review from wznoinsk January 15, 2026 04:43
@rroshan-rh rroshan-rh merged commit 7552823 into rhoai-3.2 Jan 15, 2026
2 checks passed
Shafi-Hussain pushed a commit to odh-on-pz/vllm-cpu that referenced this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants