-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core dump in running tensorflow benchmark #18
Comments
Looks like your gcc version is higher than our tested gcc-4.9. Can you try to pin down gcc before you build BytePS? Here is an example: https://github.com/bytedance/byteps/blob/master/docker/Dockerfile.worker.tensorflow#L115-L123 |
I think it's a very similar issue as this tensorflow/tensorflow#13308 (comment) It's still an open issue.. If your TF is compiled with gcc 4 (and it seems so), you have to use gcc-4.9 to build BytePS. You can try the suggestion given by @ymjiang , or use our pre-built docker image. |
This comment has been minimized.
This comment has been minimized.
@un-knight Can you share more information about your OS and env? |
@un-knight Would you reply to the issue thread you already opened #20? From your log, I don't see how your question is related to this issue. |
Just provide the binary release. No need to build one on users' environment, we do not need |
Our plan is to release binary installation package built for different frameworks/CUDA versions, in order to ease user installation process. |
@myotheone We just uploaded some pypi lists for easier installation. See https://github.com/bytedance/byteps/blob/master/docs/pip-list.md |
Closing this since we started providing pypi packages. Feel free to reopen. |
* compression: update cifar100 training script (#15) * cifar: update cifar script * cifar: update lr * cifar: add warmup * cifar: update parse * cifar: update * cifar: add log * cifar: fix typo * cifar: fix bug * cifar: fix lr * cifar: fix typo * cifar: update num samples * 1bit: update packing * 1bit: fix compile bug * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: fix typo * 1bit: fix compile bug * 1bit: exp * 1bit: test * 1bit: exp * 1bit: test * 1bit: exp * 1bit: exp * 1bit: exp * 1bit: fix typo * 1bit: fix typo * 1bit: fix typo * 1bit: try5 final * 1bit: exp rm decompress in ef * 1bit: fix typo * 1bit: fix typo * 1bit: fix bug * 1bit: fix typo * 1bit: add log * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: fix * 1bit: debug * 1bit: fix * 1bit: debug * 1bit: fix typo * 1bit: fix typo * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: debug * 1bit: fix bug * 1bit: fix bug * 1bit: fix typo * 1bit: add test * 1bit: update test * 1bit: update test * 1bit: update test script * 1bit: fix test bug * 1bit: fix test script * 1bit: update script * 1bit: update test * refactor: update name and api * refactor: fix indent * refactor: add fastupdateerror * refactor: fix link error * topk: impl fastupdateerror * topk: debug * randomk: fix fatal bug Co-authored-by: Ubuntu <[email protected]>
I have successfully installed byteps using "python setup.py install".
when i run benchmark, byteps core dumped.
core backtrack:
env:
1.tf version 1.14
2.cuda version: 9.0
3.nccl version: 2.4.7 for cuda9.0
4.os: ubuntu 16.04
5.g++: 5.4.0
script: copy from step_by_step_tutorial.md
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH
export NVIDIA_VISIBLE_DEVICES=0,1,2,3
export DMLC_WORKER_ID=0
export DMLC_NUM_WORKER=1
export DMLC_ROLE=worker
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=x.x.x.x
export DMLC_PS_ROOT_PORT=9999
export EVAL_TYPE=benchmark
python /home/mark/mark/code/byteps/launcher/launch.py \
/home/mark/mark/code/byteps/example/tensorflow/run_tensorflow_byteps.sh \
--model ResNet50 --num-iters 1000
The text was updated successfully, but these errors were encountered: