Description
🐛 Describe the bug
As discussed with @kirklandsign in Issue #8508, I am opening a separate one here.
I was trying to build executorch locally on my RPi5. It worked fine using the Clang compiler (version 14.0.6) and the release/0.4 branch. Now, with the release/0.5 and main branch, I am running into the error below. I guess it is related to the Clang compiler because when I switch to g++/gcc building executorch works just fine.
[ 56%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c2s4-minmax-neonfp16arith-mlal.c.o
[ 56%] Building CXX object kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_addmm.cpp.o
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:80:29: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
f32_dot_bf16(float32x4_t a, bfloat16x8_t b, bfloat16x8_t c) {
^~~~~~~~~~~~
float16x8_t
/usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:80:45: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
f32_dot_bf16(float32x4_t a, bfloat16x8_t b, bfloat16x8_t c) {
^~~~~~~~~~~~
float16x8_t
/usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:79:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE float32x4_t
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
__attribute__((target("arch=armv8.2-a+bf16")))
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:81:10: error: use of undeclared identifier 'vbfdotq_f32'
return vbfdotq_f32(a, b, c);
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:84:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE void
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
__attribute__((target("arch=armv8.2-a+bf16")))
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:90:9: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
^~~~~~~~~~~~
float16x8_t
/usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:90:68: error: __bf16 is not supported on this target
const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:90:34: error: use of undeclared identifier 'vld1q_bf16'
const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:92:9: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
^~~~~~~~~~~~
float16x8_t
/usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:92:68: error: __bf16 is not supported on this target
const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:92:34: error: use of undeclared identifier 'vld1q_bf16'
const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:119:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE void
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
__attribute__((target("arch=armv8.2-a+bf16")))
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:150:3: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE ET_INLINE void operator()(const Func& f) const {
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
__attribute__((target("arch=armv8.2-a+bf16")))
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:159:3: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE ET_INLINE void operator()(const Func& f) const {
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
__attribute__((target("arch=armv8.2-a+bf16")))
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:167:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE float
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
__attribute__((target("arch=armv8.2-a+bf16")))
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:176:33: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_INLINE_ATTRIBUTE ET_TARGET_ARM_BF16_ATTRIBUTE {
^
/home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
__attribute__((target("arch=armv8.2-a+bf16")))
^
7 warnings and 9 errors generated.
gmake[3]: *** [kernels/optimized/CMakeFiles/cpublas.dir/build.make:121: kernels/optimized/CMakeFiles/cpublas.dir/blas/BlasKernel.cpp.o] Fehler 1
gmake[2]: *** [CMakeFiles/Makefile2:1238: kernels/optimized/CMakeFiles/cpublas.dir/all] Fehler 2
gmake[2]: *** Es wird auf noch nicht beendete Prozesse gewartet....
[ 56%] Building CXX object kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_alias_copy.cpp.
Versions
Collecting environment information...
PyTorch version: 2.7.0.dev20250131+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 12 (bookworm) (aarch64)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: 14.0.6
CMake version: version 3.31.6
Libc version: glibc-2.36
Python version: 3.10.0 (default, Mar 3 2022, 09:51:40) [GCC 10.2.0] (64-bit runtime)
Python platform: Linux-6.6.74+rpt-rpi-v8-aarch64-with-glibc2.36
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A76
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r4p1
CPU(s) scaling MHz: 100%
CPU max MHz: 2400,0000
CPU min MHz: 1500,0000
BogoMIPS: 108,00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache: 256 KiB (4 instances)
L1i cache: 256 KiB (4 instances)
L2 cache: 2 MiB (4 instances)
L3 cache: 2 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] executorch==0.6.0a0+542480c
[pip3] numpy==2.2.3
[pip3] torch==2.7.0.dev20250131+cpu
[pip3] torchao==0.10.0+git7d879462
[pip3] torchaudio==2.6.0.dev20250131
[pip3] torchgen==0.0.1
[pip3] torchsr==1.0.4
[pip3] torchvision==0.22.0.dev20250131
[conda] executorch 0.6.0a0+542480c pypi_0 pypi
[conda] numpy 2.2.3 pypi_0 pypi
[conda] torch 2.7.0.dev20250131+cpu pypi_0 pypi
[conda] torchao 0.10.0+git7d879462 pypi_0 pypi
[conda] torchaudio 2.6.0.dev20250131 pypi_0 pypi
[conda] torchgen 0.0.1 pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.22.0.dev20250131 pypi_0 pypi