Popcnt vectorization #198

diegohavenstein · 2014-08-29T14:09:43Z

Hi Simon,

Same as in the other pull request, added the missing masks

Best regards,
Diego

simongog · 2014-09-01T09:13:38Z

CMakeModules/CheckAVX2.cmake

+#	get_filename_component(_cpu_id "[HKEY_LOCAL_MACHINE\\Hardware\\Description\\System\\CentralProcessor\\0;Identifier]" NAME CACHE)	
+elseif(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
+#  handle MacOs
+execute_process(COMMAND sysctl -n machdep.cpu.features


Hi Diego, thanks for your contribution. I'm just testing the code on a Mac equipped with a CPU (i7-4850HQ) which supports AVX2. Surprisingly the command sysctl -n machdep.cpu.features does not list AVX2 as feature, but just:
FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C. However AVX2 is listed in the output of sysctl -n machdep.cpu:

13 2147483656 GenuineIntel Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz 6 70 4 0 1 3219913727 2147154943 12219 739248384 33 263777 0 FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C SMEP ENFSTRG RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SYSCALL XD 1GBPAGE EM64T LAHF RDTSCP TSCI 16 8 15 5 64 64 3 270624 1 1 1 2 1 1 1 1 0 1 7 832 832 0 3 4 48 7 0 3 48 64 8 256 8 64 64 1024 39 48 4 8

So maybe just a match on the latter output?

Sounds reasonable :) I have run my code on Linux machines only, so I did not face this problem

mpetri · 2014-09-01T23:34:18Z

It would be good to specialize bitvector_interleaved to use these operations if the blocksize%256==0. should result a nice speed improvement.

sdsllitebot · 2014-09-05T09:01:01Z

Can one of the admins verify this patch?

mpetri · 2014-10-02T00:26:35Z

rrr_helper::binomial_coefficients_trait<7>::popcount() can now use cnt128 instead of 2 cnt() operations.

Diego Havenstein and others added 5 commits August 29, 2014 14:13

AVX2 support for popcount function. To be tested.

88a8a0c

Added support for SSE based popcount

a65f3f4

Bugfixes, now compiles and "make test" is running

8983625

CheckAVX2.cmake

40409bc

Bug fixes

9748a34

simongog reviewed Sep 1, 2014
View reviewed changes

Diego Havenstein added 3 commits September 1, 2014 15:59

Some of the fixes suggested on GitHub by Simon

7f72dc4

__AVX2__ -> __SSE4_2__

bd7ad14

Little fix

91362e6

Diego Havenstein added 3 commits September 5, 2014 16:05

bug in uint256_t fixed (wrong accessing pattern to array elements)

24ad2ce

values instead of ymm to access data in ymm_union

db534bb

count variable being used where it needs to be used now

7fe66dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Popcnt vectorization #198

Popcnt vectorization #198

diegohavenstein commented Aug 29, 2014

simongog Sep 1, 2014

diegohavenstein Sep 1, 2014

mpetri commented Sep 1, 2014

sdsllitebot commented Sep 5, 2014

mpetri commented Oct 2, 2014

Popcnt vectorization #198

Are you sure you want to change the base?

Popcnt vectorization #198

Conversation

diegohavenstein commented Aug 29, 2014

simongog Sep 1, 2014

Choose a reason for hiding this comment

diegohavenstein Sep 1, 2014

Choose a reason for hiding this comment

mpetri commented Sep 1, 2014

sdsllitebot commented Sep 5, 2014

mpetri commented Oct 2, 2014