Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessRGB.cpp only supports AArch64 NEON intrinsics, fails on ARMv7 on Android #25

Closed
akien-mga opened this issue May 10, 2022 · 1 comment · Fixed by #26
Closed

ProcessRGB.cpp only supports AArch64 NEON intrinsics, fails on ARMv7 on Android #25

akien-mga opened this issue May 10, 2022 · 1 comment · Fixed by #26

Comments

@akien-mga
Copy link
Contributor

We use etcpak in the Godot editor, and the Godot editor can be compiled for ARMv7 with NEON for Linux, Windows and Android.

For the Android build at least, using NDK 21.4, we're running into a build issue in etcpak due to the use of AArch64 NEON intrinsics in ProcessRGB.cpp, which are not guarded with #if defined(__aarch64__):

/root/sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ -o thirdparty/etcpak/ProcessRGB.android.tools.armv7.o -c -std=gnu++17 -frtti -w -O0 -g -fno-limit-debug-info -fpic -ffunction-sections -funwind-tables -fstack-protector-strong -fvisibility=hidden -fno-strict-aliasing -march=armv7-a -mfloat-abi=softfp -mfpu=neon -target armv7-none-linux-androideabi -gcc-toolchain /root/sdk/ndk/21.4.7075529/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64 -Wall -Wno-ordered-compare-function-pointers -w -UNDEBUG -isystem /root/sdk/ndk/21.4.7075529/sources/cxx-stl/llvm-libc++/include -isystem /root/sdk/ndk/21.4.7075529/sources/cxx-stl/llvm-libc++abi/include --sysroot=/root/sdk/ndk/21.4.7075529/sysroot -isystem /root/sdk/ndk/21.4.7075529/sysroot/usr/include/arm-linux-androideabi -isystem /root/sdk/ndk/21.4.7075529/sources/android/support/include -DDEBUG_ENABLED -DDEV_ENABLED -DNO_EDITOR_SPLASH -D_DEBUG -D__ANDROID_API__=24 -DNO_STATVFS -DGLES_ENABLED -D_FILE_OFFSET_BITS=64 -D__ARM_ARCH_7__ -D__ARM_ARCH_7A__ -D__ARM_NEON__ -DANDROID_ENABLED -DUNIX_ENABLED -DNO_FCNTL -DVULKAN_ENABLED -DTOOLS_ENABLED -DMINIZIP_ENABLED -DZSTD_STATIC_LINKING_ONLY -DUSE_VOLK -DVK_USE_PLATFORM_ANDROID_KHR -Ithirdparty/etcpak -Ithirdparty/libpng -Ithirdparty/volk -Ithirdparty/vulkan -Ithirdparty/vulkan/include -Ithirdparty/zstd -Ithirdparty/zlib -Iplatform/android -I. thirdparty/etcpak/ProcessRGB.cpp
thirdparty/etcpak/ProcessRGB.cpp:2595:25: error: use of undeclared identifier 'vmaxvq_u8'
    const uint8_t max = vmaxvq_u8( buffer );
                        ^
thirdparty/etcpak/ProcessRGB.cpp:2615:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2616:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2617:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2619:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2620:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2621:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2642:25: error: use of undeclared identifier 'vminvq_u8'; did you mean 'vmvnq_u8'?
    const uint8_t min = vminvq_u8( buffer );
                        ^~~~~~~~~
                        vmvnq_u8
/root/sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.9/include/arm_neon.h:16368:17: note: 'vmvnq_u8' declared here
__ai uint8x16_t vmvnq_u8(uint8x16_t __p0) {
                ^
thirdparty/etcpak/ProcessRGB.cpp:2642:19: error: cannot initialize a variable of type 'const uint8_t' (aka 'const unsigned char') with an rvalue of type 'uint8x16_t' (vector of 16 'uint8_t' values)
    const uint8_t min = vminvq_u8( buffer );
                  ^     ~~~~~~~~~~~~~~~~~~~
thirdparty/etcpak/ProcessRGB.cpp:2662:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2663:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2664:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_lsb = vpaddq_u16( pos_lsb, pos_lsb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2666:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2667:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2668:15: error: use of undeclared identifier 'vpaddq_u16'
    pos_msb = vpaddq_u16( pos_msb, pos_msb );
              ^
thirdparty/etcpak/ProcessRGB.cpp:2856:43: error: use of undeclared identifier 'vzip1q_u64'
    uint8x16_t rr = vreinterpretq_u8_u64( vzip1q_u64( vreinterpretq_u64_u8( rgb01.val[0] ), vreinterpretq_u64_u8( rgb23.val[0] ) ) );
                                          ^
thirdparty/etcpak/ProcessRGB.cpp:2857:43: error: use of undeclared identifier 'vzip2q_u64'
    uint8x16_t gg = vreinterpretq_u8_u64( vzip2q_u64( vreinterpretq_u64_u8( rgb01.val[0] ), vreinterpretq_u64_u8( rgb23.val[0] ) ) );
                                          ^
thirdparty/etcpak/ProcessRGB.cpp:2858:43: error: use of undeclared identifier 'vzip1q_u64'
    uint8x16_t bb = vreinterpretq_u8_u64( vzip1q_u64( vreinterpretq_u64_u8( rgb01.val[1] ), vreinterpretq_u64_u8( rgb23.val[1] ) ) );
                                          ^
18 errors generated.

You have such guards in ProcessDxtc.cpp with versions for both AArch64 and ARMv7, but not in ProcessRGB.cpp:

etcpak/ProcessDxtc.cpp

Lines 277 to 281 in 14dfc2b

#elif defined __ARM_NEON
# ifdef __aarch64__
uint8x16x4_t px = vld4q_u8( src );
uint8x16_t lr = px.val[0];

etcpak/ProcessRGB.cpp

Lines 2592 to 2596 in 14dfc2b

#elif defined __ARM_NEON
static inline int16_t hMax( uint8x16_t buffer, uint8_t& idx )
{
const uint8_t max = vmaxvq_u8( buffer );
const uint16x8_t vmax = vdupq_n_u16( max );

Would it be possible to add checks to ProcessRGB.cpp so that it only uses AArch64 specific intrinsics when __aarch64__ is defined?

AFAICT I'm fine if it falls back to a slow path on ARMv7+NEON, I don't expect it to be a very relevant use case for using etcpak in production, but it would be good if it could compile nevertheless (otherwise that means we can't ship the Godot editor on ARMv7 platforms, unless we disable VRAM compression which is pretty important for our use case).

BTW, the fallback path doesn't seem to be functional, as this hack:

diff --git a/thirdparty/etcpak/ProcessRGB.cpp b/thirdparty/etcpak/ProcessRGB.cpp
index d60164bcc8..b1c973c16f 100644
--- a/thirdparty/etcpak/ProcessRGB.cpp
+++ b/thirdparty/etcpak/ProcessRGB.cpp
@@ -1,3 +1,8 @@
+#if defined(__ARM_NEON) && !defined(__aarch64__)
+// All code below assumes __ARM_NEON == __aarch64__.
+#  undef __ARM_NEON
+#endif
+
 #include <array>
 #include <string.h>
 #include <limits>

Leads to these errors:

[ 68%] Compiling thirdparty/etcpak/ProcessRGB.cpp ...
thirdparty/etcpak/ProcessRGB.cpp:2287:14: error: cannot initialize a variable of type 'uint8_t *' (aka 'unsigned char *') with an rvalue of type 'uint8_t (*)[16]'
    uint8_t* luma = &l.val;
             ^      ~~~~~~
thirdparty/etcpak/ProcessRGB.cpp:2824:28: error: unknown type name 'Channels'
static etcpak_force_inline Channels GetChannels( const uint8_t* src )
                           ^
thirdparty/etcpak/ProcessRGB.cpp:2826:5: error: unknown type name 'Channels'
    Channels ch;
    ^
thirdparty/etcpak/ProcessRGB.cpp:3073:5: error: unknown type name 'Channels'
    Channels ch = GetChannels( src );
    ^
@wolfpld
Copy link
Owner

wolfpld commented May 10, 2022

The issue you see should be limited to changes made in commits f11aaa3 and/or ae0e7eb. Code paths should be properly guarded in the remaining cases (which are the majority both code- and performance-wise). You may check if this is indeed the case and just fix these few #ifdefs, falling back to scalar code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants