Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsupported van type: 1 Error when launch RDMA #371

Closed
Ruinhuang opened this issue Mar 12, 2021 · 4 comments
Closed

unsupported van type: 1 Error when launch RDMA #371

Ruinhuang opened this issue Mar 12, 2021 · 4 comments

Comments

@Ruinhuang
Copy link

Ruinhuang commented Mar 12, 2021

i tried to lunch RDMA with pytorch.
This is my command:
`export DMLC_ENABLE_RDMA=1
export DMLC_NUM_WORKER=2
export DMLC_ROLE=scheduler
export DMLC_NUM_SERVER=1

export DMLC_INTERFACE=ib0

export DMLC_PS_ROOT_URI=10.0.0.100
export DMLC_PS_ROOT_PORT=9000
bpslaunch`

This is error info:
`BytePS launching scheduler
Command: python3 -c 'import byteps.server'

[08:35:53] byteps/server/server.cc:430: BytePS server engine uses 4 threads, consider increasing BYTEPS_SERVER_ENGINE_THREAD for higher performance
[08:35:53] src/postoffice.cc:25: Creating Van: 1
[08:35:53] 3rdparty/ps-lite/include/dmlc/logging.h:276: [08:35:53] src/van.cc:97: unsupported van type: 1

Stack trace returned 10 entries:
[bt] (0) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x28a2b) [0x7f1e9798aa2b]
[bt] (1) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x28d31) [0x7f1e9798ad31]
[bt] (2) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x534d8) [0x7f1e979b54d8]
[bt] (3) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x49c23) [0x7f1e979abc23]
[bt] (4) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(+0x4e584) [0x7f1e979b0584]
[bt] (5) /opt/conda/lib/python3.8/site-packages/byteps-0.2.5-py3.8-linux-x86_64.egg/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so(byteps_server+0xdaa) [0x7f1e979872ba]
[bt] (6) /opt/conda/lib/python3.8/lib-dynload/../../libffi.so.7(+0x69dd) [0x7f1e97ab59dd]
[bt] (7) /opt/conda/lib/python3.8/lib-dynload/../../libffi.so.7(+0x6067) [0x7f1e97ab5067]
[bt] (8) /opt/conda/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x10da8) [0x7f1e97acbda8]
[bt] (9) /opt/conda/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x1108c) [0x7f1e97acc08c] `

This is the ibv_devinfo:
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.27.2008
node_guid: b859:9f03:001b:a952
sys_image_guid: b859:9f03:001b:a952
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000010
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 18
port_lid: 15
port_lmc: 0x00
link_layer: InfiniBand

@ymjiang
Copy link
Member

ymjiang commented Mar 12, 2021

Did you checkout ps-lite to the latest version? This seems to be a bug incurred by https://github.com/bytedance/ps-lite/blob/6ecbd23c67e2c6a401df4de7c11a72572f3e8a3a/src/postoffice.cc#L19.

I think the fastest way to solve your problem should be export DMLC_ENABLE_RDMA=rdma.

@Ruinhuang
Copy link
Author

Ruinhuang commented Mar 13, 2021

i build byteps by source code and the version is 0.2.5 and the pslite version is 28330e
i set export DMLC_ENABLE_RDMA=rdma.
but it still shows the error
src/van.cc:97: unsupported van type: rdma

@ymjiang
Copy link
Member

ymjiang commented Mar 13, 2021

Can you make sure that RDMA-related libs are installed properly? A fast way to verify is cd byteps/3rdparty/ps-lite; make -j USE_RDMA=1.

@Ruinhuang
Copy link
Author

this issue is caused by value of export DMLC_PS_ROOT_URI, this is the ib0 ip, not node ip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants