Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add x86/avx512_fp16 detection #279

Merged
merged 1 commit into from
Oct 20, 2022
Merged

add x86/avx512_fp16 detection #279

merged 1 commit into from
Oct 20, 2022

Conversation

damageboy
Copy link
Contributor

@damageboy damageboy commented Oct 18, 2022

For reference, on an alder-lake CPU with AVX512 (and therefore, AVX512_FP16 support) enabled:

$ ./list_cpu_features
arch            : x86
brand           : 12th Gen Intel(R) Core(TM) i9-12900K
family          :   6 (0x06)
model           : 151 (0x97)
stepping        :   2 (0x02)
uarch           : INTEL_ADL
flags           : adx,aes,avx,avx2,avx512_bf16,avx512_fp16,avx512_second_fma,avx512_vp2intersect,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,avx_vnni,bmi1,bmi2,clflushopt,clfsh,clwb,cx16,cx8,erms,f16c,fma3,fpu,lzcnt,mmx,movbe,pclmulqdq,popcnt,rdrnd,rdseed,sha,smx,ss,sse,sse2,sse3,sse4_1,sse4_2,ssse3,tsc,vaes,vpclmulqdq
cache_info      : {"level":1,"cache_type":"data","cache_size":49152,"ways":12,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":1,"cache_type":"instruction","cache_size":32768,"ways":8,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":2,"cache_type":"unified","cache_size":1310720,"ways":10,"line_size":64,"tlb_entries":2048,"partitioning":1},{"level":3,"cache_type":"unified","cache_size":31457280,"ways":12,"line_size":64,"tlb_entries":40960,"partitioning":1}

On a tiger-lake laptop, predating AVX512_FP16 support:

$ arch            : x86
brand           : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
family          :   6 (0x06)
model           : 140 (0x8C)
stepping        :   1 (0x01)
uarch           : INTEL_TGL
flags           : adx,aes,avx,avx2,avx512_second_fma,avx512_vp2intersect,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,bmi1,bmi2,clflushopt,clfsh,clwb,cx16,cx8,erms,f16c,fma3,fpu,lzcnt,mmx,movbe,pclmulqdq,popcnt,rdrnd,rdseed,sha,ss,sse,sse2,sse3,sse4_1,sse4_2,ssse3,tsc,vaes,vpclmulqdq
cache_info      : {"level":1,"cache_type":"data","cache_size":49152,"ways":12,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":1,"cache_type":"instruction","cache_size":32768,"ways":8,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":2,"cache_type":"unified","cache_size":1310720,"ways":20,"line_size":64,"tlb_entries":1024,"partitioning":1},{"level":3,"cache_type":"unified","cache_size":12582912,"ways":12,"line_size":64,"tlb_entries":16384,"partitioning":1}

@toor1245
Copy link
Contributor

PR looks good. @gchatelet, @Mizux please review

@gchatelet
Copy link
Collaborator

Thx @damageboy for the PR. Can you add some tests as well?
e.g., for tigerlake this should be enabled, for haswell it should be disabled.

@toor1245
Copy link
Contributor

@gchatelet, could you also look at #266 and #277, please?

@damageboy
Copy link
Contributor Author

Thx @damageboy for the PR. Can you add some tests as well? e.g., for tigerlake this should be enabled, for haswell it should be disabled.

Sure, but just to be clear, this is only available right now in Aldel-Lake, no in tiger-lake.

The tests I manually ran and quote in the PR message above show this.

@damageboy
Copy link
Contributor Author

@gchatelet maybe I misunderstood, but currently there are no specific tigerlake tests.

I can added both tiger-lage + alder-lake tests, if that helps.
And show how _fp16 is a property of only alder-lake rather than tiger-lake...

Was that your intention?

@damageboy
Copy link
Contributor Author

Anyway, I added new tests for Tiger-lake + Alder-lake according to their respective CPUID's from @InstLatx64 repository

@damageboy
Copy link
Contributor Author

I'm experiencing macos test failures... for what should be a synthetic test... Are there any explanations for what needs to happen for these tests to succeed?

I'm assuming that this is not a case where officially macos has not tiger-lake or alder-lake support (as sold from Apple) and therefore the tests are failing due to that reason... right?

@Mizux
Copy link
Collaborator

Mizux commented Oct 19, 2022

not deeply investigate, but your new tests may miss few leafs
https://github.com/google/cpu_features/actions/runs/3281695545/jobs/5404084402

EDIT: seems linux and windows amd64 pass so it may be a macos-latest only issue...

@Mizux
Copy link
Collaborator

Mizux commented Oct 19, 2022

static void OverrideOsPreserves(OsPreserves* os_preserves) {
// On Darwin AVX512 support is On-demand.
// We have to query the OS instead of querying the Zmm save/restore state.
// https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/osfmk/i386/fpu.c#L173-L199
os_preserves->avx512_registers = GetDarwinSysCtlByName("hw.optional.avx512f");
}
?
and
void SetDarwinSysCtlByName(std::string name) {
darwin_sysctlbyname_.insert(name);
}

so the test may need to insert "hw.optional.avx512f" to correctly mock darwin os ?

@gchatelet
Copy link
Collaborator

Thx @damageboy for the PR. Can you add some tests as well? e.g., for tigerlake this should be enabled, for haswell it should be disabled.

Sure, but just to be clear, this is only available right now in Aldel-Lake, no in tiger-lake.

The tests I manually ran and quote in the PR message above show this.

Ha I misread your message then. I just wanted to have some coverage for this addition.
Adding new tests is absolutely fine as well. Thx!

@toor1245
Copy link
Contributor

@damageboy, so you need to enable avx512f, as an example: https://github.com/google/cpu_features/blob/main/test/cpuinfo_x86_test.cc#L1066-L1072

could you check with this configuration for macOS?

cpu().SetDarwinSysCtlByName("hw.optional.avx512f");

@damageboy
Copy link
Contributor Author

damageboy commented Oct 19, 2022

I'll ask a question that I know I'll regret:
Do you want me to rewrite this horrid mess of this repeated #ifdef mess between functions to something more reasonable for mocking purposes?

Something like:

// https://github.com/InstLatx64/InstLatx64/blob/master/GenuineIntel/GenuineIntel00106A1_Nehalem_CPUID.txt
TEST_F(CpuidX86Test, Nehalem) {
  // Pre AVX cpus don't have xsave
  cpu().SetOsBackupsExtendedRegisters(false);
  cpu().EnableFakeFeatures(FakeX86Features::AVX2 | FakeX86Features::AVX512);
  // Rest of the test goes as usual
  ...
}

The current state of the mocking infra here leaves a lot of ugliness throughout the code...

Would a unification be a welcome addition to this PR?

@toor1245
Copy link
Contributor

@damageboy, in this PR, there is no need to rewrite the #ifdef mess. I meant to just write something like:

TEST_F(CpuidX86Test, INTEL_ALDER_LAKE_AVX512) {
  cpu().SetOsBackupsExtendedRegisters(true);
#if defined(CPU_FEATURES_OS_MACOS)
  cpu().SetDarwinSysCtlByName("hw.optional.avx512f");
#endif 
 // Rest of the code
 ...
}

test code layout changes should be considered as a separate patch

@damageboy
Copy link
Contributor Author

I see everything is green now, LMK if anything additional is required.

Thanks!

Copy link
Collaborator

@Mizux Mizux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gchatelet gchatelet merged commit 8ca7c65 into google:main Oct 20, 2022
@gchatelet gchatelet added the enhancement New feature or request label Apr 27, 2023
@gchatelet gchatelet added this to the v0.8.0 milestone Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants