Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect Neon capability at runtime #63

Merged
merged 1 commit into from
Oct 25, 2022

Conversation

jmarshall
Copy link
Member

As discussed on debian-med (thread starts in September), while all ARMv8-A CPUs provide Neon so it is (currently) safe to assume that 64-bit ARM aarch64 programs have Neon available, there exist 32-bit ARM (eg Linux) platforms that do not provide Neon. As well as the obscure platform in the linked Firefox bug report, some Raspberry Pi models do not provide Neon but do run Linux.

So at least when compiled for 32-bit ARM, rans_enc_func() and rans_dec_func() need to check for Neon at runtime, as they do for SSE/AVX/etc on x86 using cpuid etc. And they might as well on aarch64 as well — I think I read somewhere that ARMv9-A might make Neon optional again in future.

On ARM, detecting processor capabilities is a privileged operation. So unlike x86's cpuid() intrinsics, we must query support via OS-specific API functions provided by the operating system. This PR implements this check for various operating systems using the techniques that web searches and stack overflow suggest:

  • Linux: via getauxval()
  • FreeBSD: via the similar elf_aux_info()
  • macOS: all Apple Silicon machines support Neon (so far)
  • Windows: via IsProcessorFeaturePresent()

I've tested this on 64-bit ARM Linux and FreeBSD. Unfortunately I don't have any 32-bit installations handy, or any ARM Windows machine.

It might be worth probing for <sys/auxv.h>, getauxval(), and elf_aux_info() in configure.ac and conditionalising these further.

On ARM, detecting processor capabilities is a privileged operation.
So unlike x86's cpuid() intrinsics, we must query support via OS-specific
API functions. Implemented for various operating systems:

* Linux: via getauxval()
* FreeBSD: via the similar elf_aux_info()
* macOS: all Apple Silicon machines support Neon (so far)
* Windows: via IsProcessorFeaturePresent()
@jkbonfield
Copy link
Collaborator

Thanks. That's a long ramble in the debian bug reports there! There are some useful things in there though. Eg if we're building htslib with an external htscodecs library, then we should probably add a configure check to explicitly state the minimum compatible version. This ought to be documented in INSTALL too.

Anyway, regarding this PR, many thanks! I can't test Arm-windows either, but frankly it probably only exists as a platform for tablets and defunct smart phones, so "best efforts" is more than enough for the target audience until we discover if there is one!

I think configure checks are a good idea too. I see via https://lwn.net/Articles/519085/ (a good read) and https://www.openwall.com/lists/musl/2020/02/16/3 that this function didn't always exist, and it's a part of the C library which arrived at differing times in glibcand musl, so there's a chance for additional C libraries or older versions of those that would break. I assume similarly on freebsd. You've got to go back to 2014 though, so maybe it's not a big issue any more.

@jkbonfield
Copy link
Collaborator

Regarding symbol visibility, I was thinking on this earlier.

Marking everything internal as hidden is obviously possible, but it also comes with maintenance burdens. Specifically it can make testing a royal PITA. Sometimes we want to test C code via external CI testing programs that directly manipulate internal functions and structures. Eg see https://github.com/samtools/htslib/blob/develop/test/test-bcf_set_variant_type.c and the use of #include <../vcf.c>. That's a pretty foul hack that I'd rather not have to propagate further. We could double up the building with hidden and non-hidden, but that's another horrid hoop to jump just to satisfy testing and CI stuff.

Having said that, all of the above was written when I mistakenly believed rans_set_cpu was an internal function purely for the purposes of testing compatibility of different SIMD implementations on the one system. That is indeed why I wrote it, but apparently I chose to put it in the public header file anyway (can't really harm I guess). So it's a moot point here.

However, perhaps (and now thinking more htslib as it's irrelevant to htscodecs) there is room for some middle ground that avoids ugly including of C code. Would it help Debian et al. if we had a strict naming scheme on internal functions that were exported. Eg test_* or internal_*. They'd still be flagged by the abi-checker, but it's an obvious message to the maintainer and is suitable for automated filtering of results. I'm not sure if Debian utilise ctags' ignore ability to be able to maintain a list of symbols to filter, and if so whether that can handle wildcards (it didn't look it at a cursory glance). For now though we have something that works (in htslib, and apparently the hacky C-include strategy isn't even needed here), but it's food for thought and I'm happy to be advised on easier strategies of handling this.

I wish C just had an official keyword meaning library-level-static, so we didn't need all the compiler specific visibility hacks. We can't use the old hacky alternative of #including other C files to get them in a single file (hence "static" works) because we need differing compiler options per file.

@jmarshall
Copy link
Member Author

It does occur to me that if configure.ac checked for those functions, then have_neon() could be reworked to be conditionalised on function availability rather than operating system #defines:

#if defined HAVE_GETAUXVAL
…
#elif defined HAVE_ELF_AUX_INFO
…
#elif defined __APPLE__
…

This would have the advantage of automatically picking up the other BSDs if they also provide elf_aux_info().

It would have disadvantages for HTSlib's bundled build though: it would need to also probe for these functions (no biggie I guess) and have some answer for non-configure-based ARM builds (either “no SIMD for you!” or additions to hts_probe_cc.sh and/or htslib/Makefile's config.h rule perhaps).

Regarding symbol visibility, I was thinking on this earlier. // Marking everything internal as hidden is obviously possible, but it also comes with maintenance burdens. Specifically it can make testing a royal PITA.

Symbol visibility only has an effect when linking against a shared libhtscodecs. Could these test harnesses be arranged to always link against a static libhtscodecs? That certainly makes life easier when you have to look at them in a debugger, so might not be too much of an imposition…

C doesn't even officially know what a library is… 😄

@jmarshall
Copy link
Member Author

Oh, and I only mentioned rans_set_cpu() because I saw it was consulted in the code. I didn't realise it might be in the public header, and don't have an opinion either way about whether it ought to be. And, me, I wouldn't care if it became internal-only.

@jkbonfield jkbonfield merged commit 0aa64aa into samtools:master Oct 25, 2022
@jkbonfield
Copy link
Collaborator

I'll leave the decision to whether we autoconfify the checking to a later date, as I just haven't got time to do it at the moment and it's better to fix this now than leave it handing. Thanks John.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants