superenv: handle formulae with runtime CPU detection#11608
superenv: handle formulae with runtime CPU detection#11608carlocab merged 1 commit intoHomebrew:masterfrom carlocab:runtime-arch
Conversation
|
Review period will end on 2021-06-29 at 00:00:00 UTC. |
MikeMcQuaid
left a comment
There was a problem hiding this comment.
I like this approach but think it's too broad. If you have runtime CPU detection: we still shouldn't allow passing -march but -mtune/-mcpu/ should all be fine. I'd also rather see the existing O methods used over making this also allow that.
More specifically on the bug:
it seems Open MPI also used to pass -march=skylake-avx512 to the compiler as a way of detecting CPU features, and it looks like Arrow is doing the same thing here
Is this something that we could then scope just to e.g. configure and avoid using during e.g. make?
Allowing It should be safe to allow a build to pass
The difficulty with I just think that if upstream have invested a lot of work into optimising their build, it seems wiser to rely on the work they've done rather than try to duplicate it (poorly) ourselves. However, if you'd rather not defer to upstream on this issue, I'm fine with leaving this out of the PR.
Scoping flag refurbishment to When we filter these flags, we start violating the build system's assumptions about the code it generates and that leads to broken software. This is especially bad when we do one thing at |
|
Review period ended. |
It should be safe but I'm not completely concerned it's worth the risk (in the abstract). These sort of bugs are a bit of a nightmare to reproduce.
Yes, I'd rather not defer to upstream on build flags when it doesn't actively break things (which it doesn't seem to with
Yeh, I've always been a bit dubious by this approach.
I agree. Sorry for you needing to talk me through on this so much: can you explain what we're filtering right now, how that breaks things (and on what formula) and what filtering this would avoid/fix (and on what formula)? Thanks @carlocab ❤️ |
There are two kinds of bugs here: one is when the upstream get their runtime detection code wrong, and another which we introduce via flag filtering. I'd argue the former is easier to diagnose and fix because it's something upstream will be able to reproduce in their own builds. If we really are keen on filtering flags, we probably have to be much more aggressive about it, but this might take expertise we don't have and can be a bit brittle (see below). I also think that if we want to restrict the code that the compiler generates, we should probably be doing so through an API they expose (e.g.
Sure, will drop this.
The primary thing seems to be the filtering of So the miscompilation seems to occur when we switch out the
This actively breaks In particular, the In light of this, I think our options here are:
I find the latter a bit fragile. We'd need to know about all the compiler flags we want to avoid--these are woefully documented and change over time. This might also be software-dependent because of things like custom macros (e.g. |
Well, there's the third potential bug that we're opening with this: upstream set This is something we used to set a lot of and this filtering pretty much solved the problem.
Feels like this is what we should be filtering out in this specific case. We don't want to use these instructions. We shouldn't be setting |
That's not a bug -- that is intentional. Build systems that do runtime detection will set We could try to disable AVX512 for specific formulae, but, even if we do, I think we would still need the changes here because everything I've said above about builds that do runtime detection is still true. (e.g. they could try to generate code that runs on processors older than Nehalem, but then our flag filtering interferes with that and could break things) I'm also not convinced that doing this is simpler than what I propose here. I understand the concern about breaking things needlessly, but I am confident that this will not break things and any risk that we do face is worth it: it fixes a formula with currently broken bottles, allows us to ship bottles for other formulae that work better for users with newer machines than the oldest we support, and removes the risk of miscompiled bottles from inconsistent optimisation flags. Just to emphasise how low the risk of this is: we've been shipping If it helps, I'm happy to slowly roll this out to formulae and monitor them closely to make sure nothing's broken. It seems risky, but the truth is that anything that we do here is risky, and in my view the least risky thing to do is to minimise interference with a build system that is designed to produce something that runs on a variety of hardware targets. |
|
@carlocab Thanks again for your explanations. Personally, I'm still not sure that this is worth it. While the If this is scoped to only be what we need it for today (i.e. Do you have a sense of how many formulae you want to use this on? |
|
No thanks are necessary!
I think the name is useful because it's suggestive enough for any maintainer reviewing a PR that adds this to ask the right questions about it.
This sounds good to me. Now that I think about it, I don't think we run the risk of miscompilation from filtering out
This is good to me too, but I think I can do you one better: we can audit the use of I hope this can help allay your concern about upstream trying to abuse this.
All the formulae I named in my initial post definitely do runtime detection and would be improved by the change here. Any formula used for computation-intensive tasks [1] is a good candidate too, but I'll need to check them carefully to make sure. [1] e.g. |
Yes, that would completely allay my concerns!
Ok. Don't feel strongly on the name (particularly if audits for it appear at some stage too). |
|
Great; thanks for hashing this out with me. I'm pleased with where we've landed here. |
Some formulae are able to detect the features of the runtime CPU, and execute code accordingly. This typically entails 1) the detection of features of the build-time CPU in order to determine the targets that the compiler can generate code for, and 2) generating code for the targets that the compiler can support. Our filtering of optimization flags can cause misdetection of compiler features, leading to failed builds [1], and miscompilation even when the build does not fail [2]. Let's try to fix this by allowing formulae to declare `ENV.runtime_cpu_detection` which skips the filtering of `-march` and related flags. I've also skipped the filtering of the optimisation level, since it seems to me that if upstream maintainers have gone to the lengths of writing code that detects runtime hardware, they probably also know better about appropriate `-O` flags to use. This is a partial list of formulae that should make use of this feature: 1. apache-arrow 2. fftw 3. gromacs 4. open-mpi 5. openblas Partially resolves Homebrew/homebrew-core#76537. [1] open-mpi/ompi#8306 and linked issues/PRs [2] Homebrew/homebrew-core#76537
|
I initially planned to add the audit here, but I'll do that in a separate PR. |
brew stylewith your changes locally?brew typecheckwith your changes locally?brew testswith your changes locally?Some formulae are able to detect the features of the runtime CPU, and
execute code accordingly. This typically entails 1) the detection of
features of the build-time CPU in order to determine the targets that
the compiler can generate code for, and 2) generating code for the
targets that the compiler can support.
Our filtering of optimization flags can cause misdetection of compiler
features, leading to failed builds [1], and miscompilation even when the
build does not fail [2].
Let's try to fix this by allowing formulae to declare
ENV.runtime_cpu_detectionwhich skips the filtering of-marchandrelated flags.
I've also skipped the filtering of the optimisation
level, since it seems to me that if upstream maintainers have gone to
the lengths of writing code that detects runtime hardware, they probably
also know better about appropriate
-Oflags to use.This is a partial list of formulae that should make use of this feature:
Partially resolves Homebrew/homebrew-core#76537.
[1] open-mpi/ompi#8306 and linked issues/PRs
[2] Homebrew/homebrew-core#76537