-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature detection for ARM/MacOS #41924
Conversation
LGTM. @Keno as our developer-with-an-M1, can you confirm this PR looks good? |
It looks like the new M1 processors have the same cores but more of them so it should be correct for them too. When LLVM13 merges they can also get the correct name since that's where the m1 name gets added. edit: Also who do we have to nag so they add a new scheduler for the newer chips. LLVM uses the one from the a7-cyclone which is almost 10 years old |
Bump 😄 |
src/processor_arm.cpp
Outdated
CPUID info = { | ||
uint8_t(0x61), | ||
uint8_t(0), | ||
uint16_t(0x21) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should try to reuse this code and should instead replace the whole function (_get_host_cpu
).
But if you do, you should use the M1 info, i.e. 0x23
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to add,
It could be possible to do it programmatically using developer.apple.com/documentation/kernel/1387446-sysctlbyname but that would necessitate a refactor of the code since I think it just expects linux code for now.
Not really. While the current code is linux-only,
- For the feature mask part, there's really no other standard format, (and AFAICT apple doesn't have that) and if there is one at some point it's a simple bit shuffling to translate during detection time.
- For the detection, it's intentionally made to be completely independent of the other logic, hence why I said you should replace that whole logic with your own. The linux one cover orders of magnitude more chips than what mac on arm would need and there's no point reusing it. You just need to produce the right name, feature pair.
For future expansion on arm mac, you most likely only need to switch between a few different processors with known feature set. If you really need to actually do feature detection, all you need to do then is to adjust the feature mask accordingly based on sysctlbyname
results.
src/processor_arm.cpp
Outdated
#if defined _CPU_AARCH64_ && defined _OS_DARWIN_ | ||
static NOINLINE std::pair<uint32_t,FeatureList<feature_sz>> _get_host_cpu() | ||
{ | ||
FeatureList<feature_sz> features = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return std::make_pair(CPU::apple_m1, Feature::apple_m1);
There's no need to go through the lookup; I doubt the ELF arch lookup applies to apple (and if anything it won't be elf arch...) and you'll never hit the empty case anyway; no need to lookup features from the CPU since you know what it is; there's also no need to mask any features since the extra features comes from auxv and you aren't getting that it from there.
You should be able to move arm_arch
(along with the namespace), get_elf_arch
, feature_arch_version
, generic_for_arch
, check_cpu_arch_ver
, shrink_big_little
, into the ifdef
below as well.
Edit: also you can ifdef out old line 672 to old line 1020 but please keep the change you made in that range since it's for linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote some simpler smaller logic that also shouldn't make it too hard to add new apple cores when they launch someday.
src/processor_arm.cpp
Outdated
if(strcmp(buffer,"Apple M1") == 0) | ||
return CPU::apple_m1; | ||
else | ||
return CPU::generic;// Firestorm core data based on https://opensource.apple.com/source/xnu/xnu-7195.141.2/osfmk/arm/cpuid.h.auto.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to be entirely out of place. It is where I get the information about the CPUID though so if you don't mind you can update the URL on new line 1024.
{ | ||
char buffer[128]; | ||
size_t bufferlen = 128; | ||
sysctlbyname("machdep.cpu.brand_string",&buffer,&bufferlen,NULL,0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good function to keep for future reference, but the fallback should be CPU::apple_m1
. As the code is currently written, the next generation of chip will be detected as generic and I don't think that's desired. The linux version gets around this as much as possible by doing a full feature detection (so the only thing missing would be scheduling model that we can't do that much about...) but there's nothing like that here. I highly doubt apple will release a new processor for mac that has fewer userspace CPU features than M1 so it should be safe to assume so. And it seems to be what other projects assumes as well.
src/processor_arm.cpp
Outdated
@@ -20,7 +20,10 @@ | |||
# include <sys/auxv.h> | |||
# endif | |||
#endif | |||
|
|||
#if defined _CPU_AARCH64_ && defined _OS_DARWIN_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that important but I think you can merge this with the getauxval block above.
src/processor_arm.cpp
Outdated
size_t bufferlen = 128; | ||
sysctlbyname("machdep.cpu.brand_string",&buffer,&bufferlen,NULL,0); | ||
if(strcmp(buffer,"Apple M1") == 0) | ||
return CPU::apple_m1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what I was writing previously would just corresponds to return {CPU::apple_m1, Feature::apple_m1};
in this branch and returning a different pair if a different cpu is detected. Going through find_cpu
is also fine, just a bit unnecessary.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think it's cleaner now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
Now I do believe that the developer transition kit does have an older CPU so if we want to support that we could add it explicitly and I assume @Keno would know what the brand string from that is. It should affect almost no real users and for any models in the future, the M1 should still be the sane fallback so this should still be good to go as is.
Bump :) |
+1; I've been running with a patch similar to this (but without any runtime detection) on my M1. |
if(strcmp(buffer,"Apple M1") == 0) | ||
return std::make_pair((uint32_t)CPU::apple_m1, Feature::apple_m1); | ||
else if(strcmp(buffer,"Apple M1 Max") == 0) | ||
return std::make_pair((uint32_t)CPU::apple_m1, Feature::apple_m1); | ||
else if(strcmp(buffer,"Apple M1 Pro") == 0) | ||
return std::make_pair((uint32_t)CPU::apple_m1, Feature::apple_m1); | ||
else | ||
return std::make_pair((uint32_t)CPU::apple_m1, Feature::apple_m1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, while I appreciate the intent to keep everything explicit, I wonder whether the repetition doesn't just add more clutter than necessary for now. Also, note that Apple themselves recommend parsing hw.optional
to detect CPU features, although I suppose the number of desktop(-ish) Apple CPUs will remain managable for a number of years to come.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing it is probably more complicated than just hardcoding things IMO, specially because the effort to add new features to an eventual parser is probably the same as just adding a new CPU to the list.
@gbaraldi can you rebase this on master once more? And then we should merge this. |
Attempt at fixing #40876 . The fix is a bit of bodge but it's what other languages are doing. Feature detection on macos + arm is quite hard apparently, there are some options but they are very incomplete. The cpu is targeting the a-14 since the m1 target isn't in our LLVM yet.