-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unsupported van type: 1 Error when launch RDMA #371
Comments
Did you checkout ps-lite to the latest version? This seems to be a bug incurred by https://github.com/bytedance/ps-lite/blob/6ecbd23c67e2c6a401df4de7c11a72572f3e8a3a/src/postoffice.cc#L19. I think the fastest way to solve your problem should be |
i build byteps by source code and the version is 0.2.5 and the pslite version is 28330e |
Can you make sure that RDMA-related libs are installed properly? A fast way to verify is |
this issue is caused by value of export DMLC_PS_ROOT_URI, this is the ib0 ip, not node ip |
i tried to lunch RDMA with pytorch.
This is my command:
`export DMLC_ENABLE_RDMA=1
export DMLC_NUM_WORKER=2
export DMLC_ROLE=scheduler
export DMLC_NUM_SERVER=1
export DMLC_INTERFACE=ib0
export DMLC_PS_ROOT_URI=10.0.0.100
export DMLC_PS_ROOT_PORT=9000
bpslaunch`
This is error info:
`BytePS launching scheduler
Command: python3 -c 'import byteps.server'
[08:35:53] byteps/server/server.cc:430: BytePS server engine uses 4 threads, consider increasing BYTEPS_SERVER_ENGINE_THREAD for higher performance
[08:35:53] src/postoffice.cc:25: Creating Van: 1
[08:35:53] 3rdparty/ps-lite/include/dmlc/logging.h:276: [08:35:53] src/van.cc:97: unsupported van type: 1
Stack trace returned 10 entries:
[bt] (0) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x28a2b) [0x7f1e9798aa2b]
[bt] (1) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x28d31) [0x7f1e9798ad31]
[bt] (2) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x534d8) [0x7f1e979b54d8]
[bt] (3) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x49c23) [0x7f1e979abc23]
[bt] (4) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x4e584) [0x7f1e979b0584]
[bt] (5) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(byteps_server+0xdaa) [0x7f1e979872ba]
[bt] (6) /opt/conda/lib/python3.8/lib-dynload/../../libffi.so.7(+0x69dd) [0x7f1e97ab59dd]
[bt] (7) /opt/conda/lib/python3.8/lib-dynload/../../libffi.so.7(+0x6067) [0x7f1e97ab5067]
[bt] (8) /opt/conda/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x10da8) [0x7f1e97acbda8]
[bt] (9) /opt/conda/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x1108c) [0x7f1e97acc08c] `
This is the ibv_devinfo:
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.27.2008
node_guid: b859:9f03:001b:a952
sys_image_guid: b859:9f03:001b:a952
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000010
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 18
port_lid: 15
port_lmc: 0x00
link_layer: InfiniBand
The text was updated successfully, but these errors were encountered: