-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The speed up of LCAO GPU is bad #5932
Comments
If we only consider the runtime of |
By the way, I would like to ask about the settings for OpenMP and MPI when running on the CPU.
The reason that performance degrades with an increasing number of OMP threads is partly because the physical core count on Bohrium machines is half of what is labeled. Therefore, using only half of the cores typically yields the highest efficiency. For example, on a c12 machine, setting OMP=6 will be faster than setting OMP=12. |
I've found that the reason why the GPU version is slower than the CPU version mainly lies in the significant time consumption difference of the |
The OpenMP and MPI used in CPU is 1 and 16. I used the image "registry.dp.tech/deepmodeling/abacus-cuda:latest" in bohrium, which is made based on the "Dockerfile.cuda" (https://github.com/deepmodeling/abacus-develop/blob/develop/Dockerfile.cuda). |
@pxlxingliang The issue of abnormal time consumption by the |
I have tested the time of GPU and CPU with 16/32/64 Fe on bohrium
c12_m92_1 * NVIDIA V100
(price is 4.5 RMB/hour) andc32_m64_cpu
(price is 2.56 RMB/hour).The results are:
The time cost on GPU is much longer than that on CPU, and the time cost longer when using more OpenMP threads.
gpu.zip
The text was updated successfully, but these errors were encountered: