Error: operator reshape_and_cache_cpu_impl not implemented for half when running examples/offline_inference.py on POWER10

### Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.6.0a0+git9126110
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux 9.5 (Plow) (ppc64le)
GCC version: (GCC) 13.3.1 20240611 (Red Hat 13.3.1-2)
Clang version: 18.1.8 (Red Hat, Inc. 18.1.8-3.el9)
CMake version: version 3.31.1
Libc version: glibc-2.34
Python version: 3.11.11 | packaged by conda-forge | (main, Dec 5 2024, 14:07:52) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.14.0-503.15.1.el9_5.ppc64le-ppc64le-with-glibc2.34
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False
CPU:
Architecture:             ppc64le
Byte Order:              Little Endian
CPU(s):                320
On-line CPU(s) list:         0-319
Model name:              POWER10 (architected), altivec supported
Model:                2.0 (pvr 0080 0200)
Thread(s) per core:          8
Core(s) per socket:          10
Socket(s):              4
Hypervisor vendor:          pHyp
Virtualization type:         para
L1d cache:              2.5 MiB (80 instances)
L1i cache:              3.8 MiB (80 instances)
L2 cache:               80 MiB (80 instances)
L3 cache:               320 MiB (80 instances)
NUMA node(s):             4
NUMA node0 CPU(s):          0-79
NUMA node1 CPU(s):          80-159
NUMA node2 CPU(s):          160-239
NUMA node3 CPU(s):          240-319
Vulnerability Gather data sampling:  Not affected
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:          Not affected
Vulnerability Mds:          Not affected
Vulnerability Meltdown:        Not affected
Vulnerability Mmio stale data:    Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:        Not affected
Vulnerability Spec rstack overflow:  Not affected
Vulnerability Spec store bypass:   Not affected
Vulnerability Spectre v1:       Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2:       Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
Vulnerability Srbds:         Not affected
Vulnerability Tsx async abort:    Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] optree==0.13.1
[pip3] pyzmq==26.2.0
[pip3] torch==2.6.0a0+git9126110
[pip3] transformers==4.47.0
[conda] nomkl           3.0              0  rocketce
[conda] numpy           1.26.4          pypi_0  pypi
[conda] optree          0.13.1          pypi_0  pypi
[conda] pyzmq           26.2.0     py311he15fa53_3  conda-forge
[conda] torch           2.6.0a0+git9126110     pypi_0  pypi
[conda] transformers       4.47.0          pypi_0  pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.4.post2.dev334+g85362f02.d20241216
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect
LD_LIBRARY_PATH=/home/akashk/miniconda3/envs/vllm_int8/lib/python3.11/site-packages/cv2/../../lib64:/home/akashk/miniconda3/envs/vllm_int8/lib:

### Model Input Dumps

_No response_

### 🐛 Describe the bug

I am unable to run any model (granite or any model) with auto data type
In case of facebook/opt-125 model (which is part of test case)

If I run `examples/offline_inference.py`
I get an error: `[rank0]: RuntimeError: "reshape_and_cache_cpu_impl" not implemented for 'Half'`

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Error: operator reshape_and_cache_cpu_impl not implemented for half when running examples/offline_inference.py on POWER10 #11327

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Error: operator reshape_and_cache_cpu_impl not implemented for half when running examples/offline_inference.py on POWER10 #11327

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions