-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect Neon capability at runtime #63
Conversation
On ARM, detecting processor capabilities is a privileged operation. So unlike x86's cpuid() intrinsics, we must query support via OS-specific API functions. Implemented for various operating systems: * Linux: via getauxval() * FreeBSD: via the similar elf_aux_info() * macOS: all Apple Silicon machines support Neon (so far) * Windows: via IsProcessorFeaturePresent()
Thanks. That's a long ramble in the debian bug reports there! There are some useful things in there though. Eg if we're building htslib with an external htscodecs library, then we should probably add a configure check to explicitly state the minimum compatible version. This ought to be documented in INSTALL too. Anyway, regarding this PR, many thanks! I can't test Arm-windows either, but frankly it probably only exists as a platform for tablets and defunct smart phones, so "best efforts" is more than enough for the target audience until we discover if there is one! I think configure checks are a good idea too. I see via https://lwn.net/Articles/519085/ (a good read) and https://www.openwall.com/lists/musl/2020/02/16/3 that this function didn't always exist, and it's a part of the C library which arrived at differing times in glibcand musl, so there's a chance for additional C libraries or older versions of those that would break. I assume similarly on freebsd. You've got to go back to 2014 though, so maybe it's not a big issue any more. |
Regarding symbol visibility, I was thinking on this earlier. Marking everything internal as hidden is obviously possible, but it also comes with maintenance burdens. Specifically it can make testing a royal PITA. Sometimes we want to test C code via external CI testing programs that directly manipulate internal functions and structures. Eg see https://github.com/samtools/htslib/blob/develop/test/test-bcf_set_variant_type.c and the use of Having said that, all of the above was written when I mistakenly believed However, perhaps (and now thinking more htslib as it's irrelevant to htscodecs) there is room for some middle ground that avoids ugly including of C code. Would it help Debian et al. if we had a strict naming scheme on internal functions that were exported. Eg I wish C just had an official keyword meaning library-level-static, so we didn't need all the compiler specific visibility hacks. We can't use the old hacky alternative of #including other C files to get them in a single file (hence "static" works) because we need differing compiler options per file. |
It does occur to me that if configure.ac checked for those functions, then
This would have the advantage of automatically picking up the other BSDs if they also provide It would have disadvantages for HTSlib's bundled build though: it would need to also probe for these functions (no biggie I guess) and have some answer for non-configure-based ARM builds (either “no SIMD for you!” or additions to hts_probe_cc.sh and/or htslib/Makefile's
Symbol visibility only has an effect when linking against a shared libhtscodecs. Could these test harnesses be arranged to always link against a static libhtscodecs? That certainly makes life easier when you have to look at them in a debugger, so might not be too much of an imposition… C doesn't even officially know what a library is… 😄 |
Oh, and I only mentioned |
I'll leave the decision to whether we autoconfify the checking to a later date, as I just haven't got time to do it at the moment and it's better to fix this now than leave it handing. Thanks John. |
As discussed on debian-med (thread starts in September), while all ARMv8-A CPUs provide Neon so it is (currently) safe to assume that 64-bit ARM aarch64 programs have Neon available, there exist 32-bit ARM (eg Linux) platforms that do not provide Neon. As well as the obscure platform in the linked Firefox bug report, some Raspberry Pi models do not provide Neon but do run Linux.
So at least when compiled for 32-bit ARM,
rans_enc_func()
andrans_dec_func()
need to check for Neon at runtime, as they do for SSE/AVX/etc on x86 usingcpuid
etc. And they might as well on aarch64 as well — I think I read somewhere that ARMv9-A might make Neon optional again in future.On ARM, detecting processor capabilities is a privileged operation. So unlike x86's
cpuid()
intrinsics, we must query support via OS-specific API functions provided by the operating system. This PR implements this check for various operating systems using the techniques that web searches and stack overflow suggest:getauxval()
elf_aux_info()
IsProcessorFeaturePresent()
I've tested this on 64-bit ARM Linux and FreeBSD. Unfortunately I don't have any 32-bit installations handy, or any ARM Windows machine.
It might be worth probing for
<sys/auxv.h>
,getauxval()
, andelf_aux_info()
in configure.ac and conditionalising these further.