CUDA runtime error when running with pytorch benchmark_byteps.py #20

un-knight · 2019-06-28T07:46:33Z

Describe the bug
Got cuda runtime error when running with pytorch benchmark_byteps.py.

Error info:

BytePS launching worker
running benchmark...
Model: resnet50
Batch size: 32
Number of GPUs: 1
Running warmup...
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "/usr/local/byteps/example/pytorch/benchmark_byteps.py", line 109, in <module>
    timeit.timeit(benchmark_step, number=args.num_warmup_batches)
  File "/usr/lib/python2.7/timeit.py", line 237, in timeit
    return Timer(stmt, setup, timer).timeit(number)
  File "/usr/lib/python2.7/timeit.py", line 202, in timeit
    timing = self.inner(it, self.timer)
  File "/usr/lib/python2.7/timeit.py", line 100, in inner
    _func()
  File "/usr/local/byteps/example/pytorch/benchmark_byteps.py", line 90, in benchmark_step
    output = model(data)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torchvision/models/resnet.py", line 150, in forward
    x = self.conv1(x)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:405

To Reproduce
Steps to reproduce the behavior:
Following the step by step tutorial, and I use the bytepsimage/worker_pytorch image from official.

Environment (please complete the following information):
same as byteps official pytorch worker image.

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

bobzhuyb · 2019-06-28T07:53:34Z

With only 1 GPU, BytePS is not involved in the training at all.

That said, we'll double check. Would you provide more information about your OS and CUDA version outside docker?

un-knight · 2019-06-28T08:05:02Z

@bobzhuyb Only 1 GPU is the reason for NVIDIA_VISIBLE_DEVICES setting. but with 4 GPU setting, the same error occurred.

There is some information about my host device:

OS: CentOS Linux release 7.6.1810 (Core)
CUDA: 10.0.130
nvidia driver: 418.43

There is information about docker image:

OS: Ubuntu 16.04
CUDA: 9.0.176

Maybe the problem is CUDA version too low compared to the host driver? I will update cuda to 10.x version then have a try.

changlan · 2019-06-28T08:12:40Z

Yes, It seems to be pytorch/cuda issue. I'd say try installing cuda10 pytorch version as well, since you are using a cutting edge NVIDIA driver.

bobzhuyb · 2019-06-28T08:24:25Z

Which GPU model are you using? I searched for the error output a bit, and found some similar cases. It's possible that you are using GPUs that can only run with cuda10, however we provide cuda9 in the docker image.

un-knight · 2019-06-28T13:56:46Z

I build a new docker image with cuda10.0 and also pin gcc to 4.9, but when I run the pytorch benchmark script I get a core dump error:

[[[[2019-06-28 10:33:492019-06-28 10:33:492019-06-28 10:33:492019-06-28 10:33:49..774633774628.: 774590: : F.FF 774628  : byteps/common/shared_memory.ccbyteps/common/shared_memory.ccbyteps/common/shared_memory.cc::F3939 :] byteps/common/shared_memory.cc] Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid argument39Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid argument:] 
39Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid argument
] Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: invalid argument

/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh: line 20:    35 Aborted                 (core dumped) python $path/benchmark_byteps.py $@
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/opt/anaconda/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/opt/anaconda/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh --model resnet50 --num-iters 1000' returned non-zero exit status 134.

Image environment:

PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: 
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
GPU 3: GeForce RTX 2080 Ti
GPU 4: GeForce RTX 2080 Ti
GPU 5: GeForce RTX 2080 Ti
GPU 6: GeForce RTX 2080 Ti
GPU 7: GeForce RTX 2080 Ti

Nvidia driver version: 418.43
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.16.2
[pip] numpydoc==0.8.0
[pip] torch==1.0.1.post2
[pip] torchvision==0.2.2
[conda] blas                      1.0                         mkl  
[conda] mkl                       2019.3                      199  
[conda] mkl-service               1.1.2            py37he904b0f_5  
[conda] mkl_fft                   1.0.10           py37ha843d7b_0  
[conda] mkl_random                1.0.2            py37hd81dba3_0  
[conda] pytorch                   1.0.1           py3.7_cuda10.0.130_cudnn7.4.2_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] torchvision               0.2.2                      py_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch

bobzhuyb · 2019-06-28T14:15:00Z

Thanks for the detailed info. Hmmm... looks like 2080 Ti is causing some troubles. First, it requires CUDA 10. Second, it seems that it is having some problems when doing cudaHostRegister

The problem is we don't have this card on hand. Would you do us a favor -- comment out this line https://github.com/bytedance/byteps/blob/master/byteps/common/shared_memory.cc#L39 and try again?

bobzhuyb · 2019-06-28T14:22:09Z

Can you show us the output of ipcs -lm ?

un-knight · 2019-06-30T02:30:57Z

@bobzhuyb Of cause, there is the output of ipcs:

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398442373116
min seg size (bytes) = 1

bobzhuyb · 2019-06-30T02:58:05Z

@un-knight Okay. you have enough share memory. Did you add --shm-size=32768m to your docker run command, like shown in https://github.com/bytedance/byteps/blob/master/docs/step-by-step-tutorial.md? If so, you don't have a problem on share memory.

Then the only possible problem is the cudaHostRegister() call with 2080 Ti.

un-knight · 2019-06-30T03:21:07Z

Great! now I can run the benchmark with single node after expending the docker share memory. So for a conclusion, with 2080Ti, the user needs to install cuda>=10.0, then set a proper share memory for processes to communication. Thanks for your help @bobzhuyb

bobzhuyb · 2019-06-30T03:27:09Z

Good to know. We'll build cuda10 image/package soon, so that future users don't have this problem.

bobzhuyb · 2019-06-30T04:34:14Z

Closing this issue. Feel free to reopen it if anything comes up.

un-knight · 2019-06-30T09:24:33Z

Another problem is the benchmark processes can't stop automatically after it finished a task, and the GPU memory can't be released as well. So in this case, I have to kill the process manually.

ymjiang · 2019-06-30T10:08:06Z

@un-knight Thank you for the feedback. We will take a look at the exit problem.

changlan · 2019-06-30T18:55:17Z

t[i].daemon = True could be the culprit (https://github.com/bytedance/byteps/blob/master/launcher/launch.py#L34).

un-knight · 2019-07-08T11:37:34Z

@bobzhuyb got an illegal memory access problem:

[2019-07-08 11:29:21.757838: F byteps/common/nccl_manager.cc:35] Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: an illegal memory access was encountered
Aborted (core dumped)
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/opt/anaconda/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/opt/anaconda/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'python synthetic_benchmark.py byteps' returned non-zero exit status 134.

bobzhuyb · 2019-07-08T14:38:46Z

@un-knight In what scenario? Single machine, or distributed
mode? Does it happen immediately after starting, or it will run for some iterations
?

un-knight · 2019-07-08T15:59:38Z

@un-knight In what scenario? Single machine, or distributed
mode? Does it happen immediately after starting, or it will run for some iterations
?

It's very strange that the error happens after some iterations with single machine multi-gpus, and I hadn't test it with multi-nodes yet.

The case is the synthetic pytorch benchmark could work normally while the mnist pytorch example could get an illegal memory error mentioned above after some iterations.

bobzhuyb · 2019-07-08T16:55:44Z

@un-knight In what scenario? Single machine, or distributed
mode? Does it happen immediately after starting, or it will run for some iterations
?

It's very strange that the error happens after some iterations with single machine multi-gpus, and I hadn't test it with multi-nodes yet.

The case is the synthetic pytorch benchmark could work normally while the mnist pytorch example could get an illegal memory error mentioned above after some iterations.

some iterations -- Does it always fail at the same iteration? If so, I tend to think this is the example script problem. Otherwise, it may be something in BytePS's core logic.

How many iterations can it run before it fails?

ymjiang · 2019-07-09T01:41:14Z

@un-knight The mnist example of pytorch runs 10 epochs by default. Did your problem happen after 10 epochs? If so, it could be due to that BytePS does not handle the exit properly.

un-knight · 2019-07-09T02:20:09Z

@un-knight The mnist example of pytorch runs 10 epochs by default. Did your problem happen after 10 epochs? If so, it could be due to that BytePS does not handle the exit properly.

@ymjiang @bobzhuyb It happens after 1 epoch every time in fact when I run the mnist exampel.

Train Epoch: 1 [14720/15000 (98%)]      Loss: 0.416857
Train Epoch: 1 [14720/15000 (98%)]      Loss: 0.351872
Train Epoch: 1 [14720/15000 (98%)]      Loss: 0.524274
/opt/anaconda/lib/python3.7/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
/opt/anaconda/lib/python3.7/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
/opt/anaconda/lib/python3.7/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
/opt/anaconda/lib/python3.7/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
[2019-07-09 02:17:22. 81728: F byteps/common/nccl_manager.cc:35] Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: an illegal memory access was encountered
/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh: line 20:   440 Aborted                 (core dumped) python $path/train_mnist_byteps.py $@
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/opt/anaconda/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/opt/anaconda/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh' returned non-zero exit status 134.

un-knight · 2019-07-11T01:32:54Z

So would this problem be a byteps core error?

* hotfix: update script * hotfix: comment out....

* add testcase for mixed mode * add server load * fix log

bobzhuyb mentioned this issue Jun 28, 2019

core dump in running tensorflow benchmark #18

Closed

bobzhuyb closed this as completed Jun 30, 2019

bobzhuyb mentioned this issue Sep 4, 2019

No available image for pytorch on NVIDIA driver 418 #95

Open

ghost mentioned this issue Feb 25, 2020

Coredump at ps::Meta::~Meta called in ReleaseFirstMsg #211

Closed

pleasantrabbit pushed a commit that referenced this issue Jul 13, 2020

hotfix: fix bugs (#20)

6c45c79

* hotfix: update script * hotfix: comment out....

DeruiLiu mentioned this issue Aug 6, 2020

some question about to start server. Check failed: mr ibv_reg_mr failed: Cannot allocate memory #282

Closed

pleasantrabbit pushed a commit that referenced this issue Nov 3, 2020

add testcase for mixed mode (#20)

dd8eb3d

* add testcase for mixed mode * add server load * fix log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA runtime error when running with pytorch benchmark_byteps.py #20

CUDA runtime error when running with pytorch benchmark_byteps.py #20

un-knight commented Jun 28, 2019 •

edited

Loading

bobzhuyb commented Jun 28, 2019

un-knight commented Jun 28, 2019 •

edited

Loading

changlan commented Jun 28, 2019

bobzhuyb commented Jun 28, 2019

un-knight commented Jun 28, 2019

bobzhuyb commented Jun 28, 2019 •

edited

Loading

bobzhuyb commented Jun 28, 2019

un-knight commented Jun 30, 2019

bobzhuyb commented Jun 30, 2019

un-knight commented Jun 30, 2019 •

edited

Loading

bobzhuyb commented Jun 30, 2019

bobzhuyb commented Jun 30, 2019

un-knight commented Jun 30, 2019

ymjiang commented Jun 30, 2019 •

edited

Loading

changlan commented Jun 30, 2019

un-knight commented Jul 8, 2019

bobzhuyb commented Jul 8, 2019

un-knight commented Jul 8, 2019 •

edited

Loading

bobzhuyb commented Jul 8, 2019

ymjiang commented Jul 9, 2019

un-knight commented Jul 9, 2019 •

edited

Loading

un-knight commented Jul 11, 2019

CUDA runtime error when running with pytorch benchmark_byteps.py #20

CUDA runtime error when running with pytorch benchmark_byteps.py #20

Comments

un-knight commented Jun 28, 2019 • edited Loading

bobzhuyb commented Jun 28, 2019

un-knight commented Jun 28, 2019 • edited Loading

changlan commented Jun 28, 2019

bobzhuyb commented Jun 28, 2019

un-knight commented Jun 28, 2019

bobzhuyb commented Jun 28, 2019 • edited Loading

bobzhuyb commented Jun 28, 2019

un-knight commented Jun 30, 2019

bobzhuyb commented Jun 30, 2019

un-knight commented Jun 30, 2019 • edited Loading

bobzhuyb commented Jun 30, 2019

bobzhuyb commented Jun 30, 2019

un-knight commented Jun 30, 2019

ymjiang commented Jun 30, 2019 • edited Loading

changlan commented Jun 30, 2019

un-knight commented Jul 8, 2019

bobzhuyb commented Jul 8, 2019

un-knight commented Jul 8, 2019 • edited Loading

bobzhuyb commented Jul 8, 2019

ymjiang commented Jul 9, 2019

un-knight commented Jul 9, 2019 • edited Loading

un-knight commented Jul 11, 2019

un-knight commented Jun 28, 2019 •

edited

Loading

un-knight commented Jun 28, 2019 •

edited

Loading

bobzhuyb commented Jun 28, 2019 •

edited

Loading

un-knight commented Jun 30, 2019 •

edited

Loading

ymjiang commented Jun 30, 2019 •

edited

Loading

un-knight commented Jul 8, 2019 •

edited

Loading

un-knight commented Jul 9, 2019 •

edited

Loading