add x86/avx512_fp16 detection #279

damageboy · 2022-10-18T17:08:10Z

For reference, on an alder-lake CPU with AVX512 (and therefore, AVX512_FP16 support) enabled:

$ ./list_cpu_features
arch            : x86
brand           : 12th Gen Intel(R) Core(TM) i9-12900K
family          :   6 (0x06)
model           : 151 (0x97)
stepping        :   2 (0x02)
uarch           : INTEL_ADL
flags           : adx,aes,avx,avx2,avx512_bf16,avx512_fp16,avx512_second_fma,avx512_vp2intersect,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,avx_vnni,bmi1,bmi2,clflushopt,clfsh,clwb,cx16,cx8,erms,f16c,fma3,fpu,lzcnt,mmx,movbe,pclmulqdq,popcnt,rdrnd,rdseed,sha,smx,ss,sse,sse2,sse3,sse4_1,sse4_2,ssse3,tsc,vaes,vpclmulqdq
cache_info      : {"level":1,"cache_type":"data","cache_size":49152,"ways":12,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":1,"cache_type":"instruction","cache_size":32768,"ways":8,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":2,"cache_type":"unified","cache_size":1310720,"ways":10,"line_size":64,"tlb_entries":2048,"partitioning":1},{"level":3,"cache_type":"unified","cache_size":31457280,"ways":12,"line_size":64,"tlb_entries":40960,"partitioning":1}

On a tiger-lake laptop, predating AVX512_FP16 support:

$ arch            : x86
brand           : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
family          :   6 (0x06)
model           : 140 (0x8C)
stepping        :   1 (0x01)
uarch           : INTEL_TGL
flags           : adx,aes,avx,avx2,avx512_second_fma,avx512_vp2intersect,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,bmi1,bmi2,clflushopt,clfsh,clwb,cx16,cx8,erms,f16c,fma3,fpu,lzcnt,mmx,movbe,pclmulqdq,popcnt,rdrnd,rdseed,sha,ss,sse,sse2,sse3,sse4_1,sse4_2,ssse3,tsc,vaes,vpclmulqdq
cache_info      : {"level":1,"cache_type":"data","cache_size":49152,"ways":12,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":1,"cache_type":"instruction","cache_size":32768,"ways":8,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":2,"cache_type":"unified","cache_size":1310720,"ways":20,"line_size":64,"tlb_entries":1024,"partitioning":1},{"level":3,"cache_type":"unified","cache_size":12582912,"ways":12,"line_size":64,"tlb_entries":16384,"partitioning":1}

toor1245 · 2022-10-18T20:19:14Z

PR looks good. @gchatelet, @Mizux please review

gchatelet · 2022-10-19T08:14:03Z

Thx @damageboy for the PR. Can you add some tests as well?
e.g., for tigerlake this should be enabled, for haswell it should be disabled.

toor1245 · 2022-10-19T08:56:03Z

@gchatelet, could you also look at #266 and #277, please?

damageboy · 2022-10-19T12:14:03Z

Thx @damageboy for the PR. Can you add some tests as well? e.g., for tigerlake this should be enabled, for haswell it should be disabled.

Sure, but just to be clear, this is only available right now in Aldel-Lake, no in tiger-lake.

The tests I manually ran and quote in the PR message above show this.

damageboy · 2022-10-19T12:37:35Z

@gchatelet maybe I misunderstood, but currently there are no specific tigerlake tests.

I can added both tiger-lage + alder-lake tests, if that helps.
And show how _fp16 is a property of only alder-lake rather than tiger-lake...

Was that your intention?

damageboy · 2022-10-19T12:49:58Z

Anyway, I added new tests for Tiger-lake + Alder-lake according to their respective CPUID's from @InstLatx64 repository

damageboy · 2022-10-19T12:55:05Z

I'm experiencing macos test failures... for what should be a synthetic test... Are there any explanations for what needs to happen for these tests to succeed?

I'm assuming that this is not a case where officially macos has not tiger-lake or alder-lake support (as sold from Apple) and therefore the tests are failing due to that reason... right?

Mizux · 2022-10-19T12:55:42Z

not deeply investigate, but your new tests may miss few leafs
https://github.com/google/cpu_features/actions/runs/3281695545/jobs/5404084402

EDIT: seems linux and windows amd64 pass so it may be a macos-latest only issue...

Mizux · 2022-10-19T13:08:13Z

cpu_features/src/impl_x86_macos.c

Lines 38 to 43 in 627959f

    
           static void OverrideOsPreserves(OsPreserves* os_preserves) { 
        
             // On Darwin AVX512 support is On-demand. 
        
             // We have to query the OS instead of querying the Zmm save/restore state. 
        
             // https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/osfmk/i386/fpu.c#L173-L199 
        
             os_preserves->avx512_registers = GetDarwinSysCtlByName("hw.optional.avx512f"); 
        
           }

?
and

cpu_features/test/cpuinfo_x86_test.cc

Lines 56 to 58 in 627959f

    
           void SetDarwinSysCtlByName(std::string name) { 
        
             darwin_sysctlbyname_.insert(name); 
        
           }

so the test may need to insert "hw.optional.avx512f" to correctly mock darwin os ?

gchatelet · 2022-10-19T13:10:57Z

Thx @damageboy for the PR. Can you add some tests as well? e.g., for tigerlake this should be enabled, for haswell it should be disabled.

Sure, but just to be clear, this is only available right now in Aldel-Lake, no in tiger-lake.

The tests I manually ran and quote in the PR message above show this.

Ha I misread your message then. I just wanted to have some coverage for this addition.
Adding new tests is absolutely fine as well. Thx!

toor1245 · 2022-10-19T14:07:25Z

@damageboy, so you need to enable avx512f, as an example: https://github.com/google/cpu_features/blob/main/test/cpuinfo_x86_test.cc#L1066-L1072

could you check with this configuration for macOS?

cpu().SetDarwinSysCtlByName("hw.optional.avx512f");

damageboy · 2022-10-19T16:37:55Z

I'll ask a question that I know I'll regret:
Do you want me to rewrite this horrid mess of this repeated #ifdef mess between functions to something more reasonable for mocking purposes?

Something like:

// https://github.com/InstLatx64/InstLatx64/blob/master/GenuineIntel/GenuineIntel00106A1_Nehalem_CPUID.txt
TEST_F(CpuidX86Test, Nehalem) {
  // Pre AVX cpus don't have xsave
  cpu().SetOsBackupsExtendedRegisters(false);
  cpu().EnableFakeFeatures(FakeX86Features::AVX2 | FakeX86Features::AVX512);
  // Rest of the test goes as usual
  ...
}

The current state of the mocking infra here leaves a lot of ugliness throughout the code...

Would a unification be a welcome addition to this PR?

toor1245 · 2022-10-19T18:41:10Z

@damageboy, in this PR, there is no need to rewrite the #ifdef mess. I meant to just write something like:

TEST_F(CpuidX86Test, INTEL_ALDER_LAKE_AVX512) {
  cpu().SetOsBackupsExtendedRegisters(true);
#if defined(CPU_FEATURES_OS_MACOS)
  cpu().SetDarwinSysCtlByName("hw.optional.avx512f");
#endif 
 // Rest of the code
 ...
}

test code layout changes should be considered as a separate patch

fixes #278

damageboy · 2022-10-20T09:11:28Z

I see everything is green now, LMK if anything additional is required.

Thanks!

Mizux

LGTM

toor1245 approved these changes Oct 18, 2022

View reviewed changes

add x86/avx512_fp16 detection

bb0e753

fixes #278

Mizux approved these changes Oct 20, 2022

View reviewed changes

gchatelet merged commit 8ca7c65 into google:main Oct 20, 2022

gchatelet added the enhancement New feature or request label Apr 27, 2023

gchatelet added this to the v0.8.0 milestone Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add x86/avx512_fp16 detection #279

add x86/avx512_fp16 detection #279

damageboy commented Oct 18, 2022 •

edited

Loading

toor1245 commented Oct 18, 2022

gchatelet commented Oct 19, 2022

toor1245 commented Oct 19, 2022

damageboy commented Oct 19, 2022

damageboy commented Oct 19, 2022

damageboy commented Oct 19, 2022

damageboy commented Oct 19, 2022

Mizux commented Oct 19, 2022

Mizux commented Oct 19, 2022 •

edited

Loading

gchatelet commented Oct 19, 2022

toor1245 commented Oct 19, 2022

damageboy commented Oct 19, 2022 •

edited by Mizux

Loading

toor1245 commented Oct 19, 2022

damageboy commented Oct 20, 2022

Mizux left a comment

add x86/avx512_fp16 detection #279

add x86/avx512_fp16 detection #279

Conversation

damageboy commented Oct 18, 2022 • edited Loading

toor1245 commented Oct 18, 2022

gchatelet commented Oct 19, 2022

toor1245 commented Oct 19, 2022

damageboy commented Oct 19, 2022

damageboy commented Oct 19, 2022

damageboy commented Oct 19, 2022

damageboy commented Oct 19, 2022

Mizux commented Oct 19, 2022

Mizux commented Oct 19, 2022 • edited Loading

gchatelet commented Oct 19, 2022

toor1245 commented Oct 19, 2022

damageboy commented Oct 19, 2022 • edited by Mizux Loading

toor1245 commented Oct 19, 2022

damageboy commented Oct 20, 2022

Mizux left a comment

Choose a reason for hiding this comment

damageboy commented Oct 18, 2022 •

edited

Loading

Mizux commented Oct 19, 2022 •

edited

Loading

damageboy commented Oct 19, 2022 •

edited by Mizux

Loading