Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core dump in running tensorflow benchmark #18

Closed
myotheone opened this issue Jun 28, 2019 · 9 comments
Closed

core dump in running tensorflow benchmark #18

myotheone opened this issue Jun 28, 2019 · 9 comments
Labels
enhancement New feature or request

Comments

@myotheone
Copy link

I have successfully installed byteps using "python setup.py install".
when i run benchmark, byteps core dumped.

core backtrack:
image

env:
1.tf version 1.14
2.cuda version: 9.0
3.nccl version: 2.4.7 for cuda9.0
4.os: ubuntu 16.04
5.g++: 5.4.0

script: copy from step_by_step_tutorial.md
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH
export NVIDIA_VISIBLE_DEVICES=0,1,2,3
export DMLC_WORKER_ID=0
export DMLC_NUM_WORKER=1
export DMLC_ROLE=worker
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=x.x.x.x
export DMLC_PS_ROOT_PORT=9999
export EVAL_TYPE=benchmark

python /home/mark/mark/code/byteps/launcher/launch.py \
/home/mark/mark/code/byteps/example/tensorflow/run_tensorflow_byteps.sh \
--model ResNet50 --num-iters 1000

@ymjiang
Copy link
Member

ymjiang commented Jun 28, 2019

Looks like your gcc version is higher than our tested gcc-4.9.

Can you try to pin down gcc before you build BytePS? Here is an example: https://github.com/bytedance/byteps/blob/master/docker/Dockerfile.worker.tensorflow#L115-L123

@bobzhuyb
Copy link
Member

bobzhuyb commented Jun 28, 2019

I think it's a very similar issue as this tensorflow/tensorflow#13308 (comment)
It happens when BytePS is compiled with gcc 5 while TF is compiled with gcc 4.

It's still an open issue..

If your TF is compiled with gcc 4 (and it seems so), you have to use gcc-4.9 to build BytePS. You can try the suggestion given by @ymjiang , or use our pre-built docker image.

@un-knight

This comment has been minimized.

@ymjiang
Copy link
Member

ymjiang commented Jun 28, 2019

@un-knight Can you share more information about your OS and env?

@bobzhuyb
Copy link
Member

bobzhuyb commented Jun 28, 2019

@un-knight Would you reply to the issue thread you already opened #20? From your log, I don't see how your question is related to this issue.

@byronyi
Copy link
Member

byronyi commented Jun 29, 2019

Just provide the binary release. No need to build one on users' environment, we do not need mpicc or mpicxx.

@bobzhuyb bobzhuyb added the enhancement New feature or request label Jun 30, 2019
@bobzhuyb
Copy link
Member

Our plan is to release binary installation package built for different frameworks/CUDA versions, in order to ease user installation process.

@ymjiang
Copy link
Member

ymjiang commented Jul 2, 2019

@myotheone We just uploaded some pypi lists for easier installation. See https://github.com/bytedance/byteps/blob/master/docs/pip-list.md

@bobzhuyb
Copy link
Member

bobzhuyb commented Jul 3, 2019

Closing this since we started providing pypi packages. Feel free to reopen.

@bobzhuyb bobzhuyb closed this as completed Jul 3, 2019
pleasantrabbit pushed a commit that referenced this issue Jul 13, 2020
* compression: update cifar100 training script (#15)

* cifar: update cifar script

* cifar: update lr

* cifar: add warmup

* cifar: update parse

* cifar: update

* cifar: add log

* cifar: fix typo

* cifar: fix bug

* cifar: fix lr

* cifar: fix typo

* cifar: update num samples

* 1bit: update packing

* 1bit: fix compile bug

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: fix typo

* 1bit: fix compile bug

* 1bit: exp

* 1bit: test

* 1bit: exp

* 1bit: test

* 1bit: exp

* 1bit: exp

* 1bit: exp

* 1bit: fix typo

* 1bit: fix typo

* 1bit: fix typo

* 1bit: try5 final

* 1bit: exp rm decompress in ef

* 1bit: fix typo

* 1bit: fix typo

* 1bit: fix bug

* 1bit: fix typo

* 1bit: add log

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: fix

* 1bit: debug

* 1bit: fix

* 1bit: debug

* 1bit: fix typo

* 1bit: fix typo

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: debug

* 1bit: fix bug

* 1bit: fix bug

* 1bit: fix typo

* 1bit: add test

* 1bit: update test

* 1bit: update test

* 1bit: update test script

* 1bit: fix test bug

* 1bit: fix test script

* 1bit: update script

* 1bit: update test

* refactor: update name and api

* refactor: fix indent

* refactor: add fastupdateerror

* refactor: fix link error

* topk: impl fastupdateerror

* topk: debug

* randomk: fix fatal bug

Co-authored-by: Ubuntu <[email protected]>
pleasantrabbit pushed a commit that referenced this issue Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants