-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Description
Your current environment
The output of python collect_env.py
Collecting environment information...
PyTorch version: 2.6.0a0+git9126110
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux 9.5 (Plow) (ppc64le)
GCC version: (GCC) 13.3.1 20240611 (Red Hat 13.3.1-2)
Clang version: 18.1.8 (Red Hat, Inc. 18.1.8-3.el9)
CMake version: version 3.31.1
Libc version: glibc-2.34
Python version: 3.11.11 | packaged by conda-forge | (main, Dec 5 2024, 14:07:52) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.14.0-503.15.1.el9_5.ppc64le-ppc64le-with-glibc2.34
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False
CPU:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 320
On-line CPU(s) list: 0-319
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 10
Socket(s): 4
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 2.5 MiB (80 instances)
L1i cache: 3.8 MiB (80 instances)
L2 cache: 80 MiB (80 instances)
L3 cache: 320 MiB (80 instances)
NUMA node(s): 4
NUMA node0 CPU(s): 0-79
NUMA node1 CPU(s): 80-159
NUMA node2 CPU(s): 160-239
NUMA node3 CPU(s): 240-319
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] optree==0.13.1
[pip3] pyzmq==26.2.0
[pip3] torch==2.6.0a0+git9126110
[pip3] transformers==4.47.0
[conda] nomkl 3.0 0 rocketce
[conda] numpy 1.26.4 pypi_0 pypi
[conda] optree 0.13.1 pypi_0 pypi
[conda] pyzmq 26.2.0 py311he15fa53_3 conda-forge
[conda] torch 2.6.0a0+git9126110 pypi_0 pypi
[conda] transformers 4.47.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.4.post2.dev334+g85362f02.d20241216
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect
LD_LIBRARY_PATH=/home/akashk/miniconda3/envs/vllm_int8/lib/python3.11/site-packages/cv2/../../lib64:/home/akashk/miniconda3/envs/vllm_int8/lib:
Model Input Dumps
No response
🐛 Describe the bug
I am unable to run any model (granite or any model) with auto data type
In case of facebook/opt-125 model (which is part of test case)
If I run examples/offline_inference.py
I get an error: [rank0]: RuntimeError: "reshape_and_cache_cpu_impl" not implemented for 'Half'
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.