Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About VASP run problem #6

Open
p00380563 opened this issue Jul 2, 2021 · 2 comments
Open

About VASP run problem #6

p00380563 opened this issue Jul 2, 2021 · 2 comments

Comments

@p00380563
Copy link

p00380563 commented Jul 2, 2021

I have a error when I run VASP where i assign the -np 22, i try 21 is OK.
I am sure i allocated enough cpu cores to mckernel, can someone tell me the reason?

hareware: arm server: 128 cores;
software: centos7.6 + openmpi 4.0.5 + mckernel 1.7

Error as follow:
_[root@localhost VASP_bench_pt]# mpirun -np 22 --allow-run-as-root -x OMP_NUM_THREADS=1 /root/sysroot/bin/mcexec -n 22 ../../bin/vasp_std

There are not enough slots available in the system to satisfy the 22
slots that were requested by the application:

/root/sysroot/bin/mcexec

Either request fewer slots for your application, or make more slots
available for use._

The mckernel information:
[root@localhost sysroot]# ./sbin/ihkosctl 0 query cpu
4-110
[root@localhost sysroot]#
[root@localhost sysroot]#
[root@localhost sysroot]# ./sbin/ihkosctl 0 query mem
52428800000@0,52428800000@1,52428800000@2,52428800000@3
[root@localhost sysroot]#
[root@localhost sysroot]# ./sbin/ihkosctl 0 kmsg
[ 0]: boot_param_size: 65536
[ 0]: %: GICv3
[ 0]: setup_arm64 done.
IHK/McKernel started.
[ 0]: ns_per_tsc: 10000
[ 0]: KCommand Line: hidos dump_level=24 time_sharing
[ 0]: Physical memory: 0x2080310000 - 0x2cb5000000, 52425588736 bytes, 799951 pages available @ NUMA: 0
[ 0]: Physical memory: 0x4000000000 - 0x4c35000000, 52428800000 bytes, 800000 pages available @ NUMA: 1
[ 0]: Physical memory: 0x202000000000 - 0x202c35000000, 52428800000 bytes, 800000 pages available @ NUMA: 2
[ 0]: Physical memory: 0x204000000000 - 0x204c35000000, 52428800000 bytes, 800000 pages available @ NUMA: 3
[ 0]: NUMA: 0, Linux NUMA: 0, type: 1, available bytes: 52425588736, pages: 799951
[ 0]: NUMA: 1, Linux NUMA: 1, type: 1, available bytes: 52428800000, pages: 800000
[ 0]: NUMA: 2, Linux NUMA: 2, type: 1, available bytes: 52428800000, pages: 800000
[ 0]: NUMA: 3, Linux NUMA: 3, type: 1, available bytes: 52428800000, pages: 800000
[ 0]: NUMA 0 distances: 0 (10), 1 (16), 2 (32), 3 (33),
[ 0]: NUMA 1 distances: 1 (10), 0 (16), 2 (25), 3 (32),
[ 0]: NUMA 2 distances: 2 (10), 3 (16), 1 (25), 0 (32),
[ 0]: NUMA 3 distances: 3 (10), 2 (16), 1 (32), 0 (33),
[ 0]: Trampoline area: 0x0
[ 0]: # of cpus : 107
[ 0]: locals = ffff802080380000
[ 0]: BSP: 0 (HW ID: 4 @ NUMA 0)
[ 0]: BSP: booted 106 AP CPUs
[ 0]: Master channel init acked.
[ 0]: Using Linux work IRQ for IKC IPI.
[ 0]: Enable Host mapping vDSO.
IHK/McKernel booted.
[ 32]: schedule: WARNING can't schedule() while no preemption, cnt: 1
[ 32]: schedule: WARNING can't schedule() while no preemption, cnt: 1

@bgerofi
Copy link
Contributor

bgerofi commented Jul 2, 2021

Hi, why are you booting on 107 CPUs? If you insist on running 22 ranks it would be better to boot McKernel using a multiple of 22 cores, e.g., 88? For example, you could try mcreboot -c 40-127

In general we prefer to run on round number of CPU cores (preferably power of 2). Also, it's better to leave a few cores for Linux from each NUMA node and make sure that the McKernel cores are also balanced across NUMA domains.

@p00380563
Copy link
Author

p00380563 commented Jul 6, 2021

hi, begerofi, as your advice, i try boot 4 cores of NUMA0 for mckernel.
The mckernel information:
[root@localhost sysroot]# ./sbin/mcreboot.sh -c 12-15 -m 50000m@0
[root@localhost sysroot]# ./sbin/ihkosctl 0 kmsg
[ 0]: boot_param_size: 65536
[ 0]: %: GICv3
[ 0]: setup_arm64 done.
IHK/McKernel started.
[ 0]: ns_per_tsc: 10000
[ 0]: KCommand Line: hidos dump_level=24 time_sharing
[ 0]: Physical memory: 0x2080300000 - 0x2cb5000000, 52425654272 bytes, 799952 pages available @ NUMA: 0
[ 0]: NUMA: 0, Linux NUMA: 0, type: 1, available bytes: 52425654272, pages: 799952
[ 0]: NUMA 0 distances: 0 (10),
[ 0]: Trampoline area: 0x0
[ 0]: # of cpus : 4
[ 0]: locals = ffff802080340000
[ 0]: BSP: 0 (HW ID: 12 @ NUMA 0)
[ 0]: BSP: booted 3 AP CPUs
[ 0]: Master channel init acked.
[ 0]: Using Linux work IRQ for IKC IPI.
[ 0]: Enable Host mapping vDSO.
IHK/McKernel booted.

And i test HPL, but there is no any output , i think cpu is hang.

_[root@localhost Linux_Arm]# mpirun -np 4 --allow-run-as-root /root/sysroot/bin/mcexec -n 4 ./xhpl


  • hwloc 2.0.2rc1-git has encountered what looks like an error from the operating system.
  • Group0 (cpuset 0xffff0fff) intersects with Package (P#36 cpuset 0xffffffff,0xffff0fff nodeset 0x00000003) without inclusion!
  • Error occurred in topology.c line 1384
  • The following FAQ entry in the hwloc documentation may help:
  • What should I do when hwloc reports "operating system" warnings?
  • Otherwise please report this error message to the hwloc user's mailing list,
  • along with the files generated by the hwloc-gather-topology script.
    ****************************************************************************_

I try mcstat command, but the output is no change for three times:
[root@localhost sysroot]# ./bin/mcstat
------- memory (GB) ------- ------- tsc ------ --- thread ---
total current max system user current max
48.825 0.147 0.147 39 3 12 12
cpuacct_usage_percpu[0] = 5935640
cpuacct_usage_percpu[1] = 5942580
cpuacct_usage_percpu[2] = 5823800
cpuacct_usage_percpu[3] = 5974470
cpuacct_usage_percpu[4] = 0
cpuacct_usage_percpu[5] = 0
cpuacct_usage_percpu[6] = 0
cpuacct_usage_percpu[7] = 0
cpuacct_usage_percpu[8] = 0
cpuacct_usage_percpu[9] = 0
cpuacct_usage_percpu[10] = 0
cpuacct_usage_percpu[11] = 0
[root@localhost sysroot]# ./bin/mcstat
------- memory (GB) ------- ------- tsc ------ --- thread ---
total current max system user current max
48.825 0.147 0.147 39 3 12 12
cpuacct_usage_percpu[0] = 5935640
cpuacct_usage_percpu[1] = 5942580
cpuacct_usage_percpu[2] = 5823800
cpuacct_usage_percpu[3] = 5974470
cpuacct_usage_percpu[4] = 0
cpuacct_usage_percpu[5] = 0
cpuacct_usage_percpu[6] = 0
cpuacct_usage_percpu[7] = 0
cpuacct_usage_percpu[8] = 0
cpuacct_usage_percpu[9] = 0
cpuacct_usage_percpu[10] = 0
cpuacct_usage_percpu[11] = 0
[root@localhost sysroot]# ./bin/mcstat
------- memory (GB) ------- ------- tsc ------ --- thread ---
total current max system user current max
48.825 0.147 0.147 39 3 12 12
cpuacct_usage_percpu[0] = 5935640
cpuacct_usage_percpu[1] = 5942580
cpuacct_usage_percpu[2] = 5823800
cpuacct_usage_percpu[3] = 5974470
cpuacct_usage_percpu[4] = 0
cpuacct_usage_percpu[5] = 0
cpuacct_usage_percpu[6] = 0
cpuacct_usage_percpu[7] = 0
cpuacct_usage_percpu[8] = 0
cpuacct_usage_percpu[9] = 0
cpuacct_usage_percpu[10] = 0
cpuacct_usage_percpu[11] = 0

i don't know what happen, maybe something i configure is wrong?

And i stop mckernel:
[root@localhost sysroot]# ./sbin/mcstop+release.sh
error: destroying OS instance 0
error: destroying OS instance 0
error: destroying OS instance 0
error: destroying OS instance 0
error: destroying OS instance 0
error: destroying LWK instance 0 failed
[root@localhost sysroot]#

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants