Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Popcnt vectorization #198

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Popcnt vectorization #198

wants to merge 11 commits into from

Conversation

diegohavenstein
Copy link

Hi Simon,

Same as in the other pull request, added the missing masks

Best regards,
Diego

# get_filename_component(_cpu_id "[HKEY_LOCAL_MACHINE\\Hardware\\Description\\System\\CentralProcessor\\0;Identifier]" NAME CACHE)
elseif(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
# handle MacOs
execute_process(COMMAND sysctl -n machdep.cpu.features
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Diego, thanks for your contribution. I'm just testing the code on a Mac equipped with a CPU (i7-4850HQ) which supports AVX2. Surprisingly the command sysctl -n machdep.cpu.features does not list AVX2 as feature, but just:
FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C. However AVX2 is listed in the output of sysctl -n machdep.cpu:

13 2147483656 GenuineIntel Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz 6 70 4 0 1 3219913727 2147154943 12219 739248384 33 263777 0 FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C SMEP ENFSTRG RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SYSCALL XD 1GBPAGE EM64T LAHF RDTSCP TSCI 16 8 15 5 64 64 3 270624 1 1 1 2 1 1 1 1 0 1 7 832 832 0 3 4 48 7 0 3 48 64 8 256 8 64 64 1024 39 48 4 8

So maybe just a match on the latter output?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable :) I have run my code on Linux machines only, so I did not face this problem

@mpetri
Copy link
Collaborator

mpetri commented Sep 1, 2014

It would be good to specialize bitvector_interleaved to use these operations if the blocksize%256==0. should result a nice speed improvement.

@sdsllitebot
Copy link
Collaborator

Can one of the admins verify this patch?

@mpetri
Copy link
Collaborator

mpetri commented Oct 2, 2014

rrr_helper::binomial_coefficients_trait<7>::popcount() can now use cnt128 instead of 2 cnt() operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants