-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Namescope is None when hybridize in multi-threading environment. AttributeError: 'NoneType' object has no attribute '__exit__' #13199
Comments
The latest version of mxnet also have this bug. |
@mxnet-label-bot add [Gluon, Thread Safety, Bug] |
@kohillyang |
I'm facing the same problem. Has this issue been fixed? |
mxnet in general is not thread safe. You can accomplish the above using multiprocessing. import multiprocessing as mp
import gluoncv
import mxnet as mx
net = gluoncv.model_zoo.resnet18_v1b(pretrained=True)
net.hybridize()
def worker(module, input, outputs):
outnd = module(input) # type: mx.nd.NDArray
outnd.wait_to_read()
outputs.put(outnd)
ps = []
outputs = mp.Queue(5)
for i in range(3):
input1 = mx.random.randn(1, 3, 368, 368)
p = mp.Process(target=worker, args=(net, input1, outputs))
ps.append(p)
for p in ps:
p.start()
for p in ps:
p.join()
while not outputs.empty():
print(outputs.get().shape) |
But unlike pytorch, it is not possible to optimize the network if using Process instead. I found a inconvenient way to solve it is to inference once before pushing it into sub-threads. |
In my opinion, supporting multi-threading in Python will drop the performance, because we need to add locks to keep thread-safety. I think it's better to use multi-process in Python, which has GIL and create a fake multi-threading. We could pass NDArray object through Pipe, e.g. Gluon DataLoader Could you please provide some projects which use multi-threading to optimize a network? We may support multi-threading in Python if it is necessary. Thank you! I submited a PR just now, which may support multi-theading environment for Gluon. |
@wkcn, Glad to see this issue is going to be resolved. Anyway, At least according to my test, mxnet has already supported the multi-threading training, for example, https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/parallel.py, and https://github.com/dmlc/gluon-cv/blob/master/scripts/segmentation/train.py uses |
@kohillyang Thank you! |
Yes. Another case is that if some codes is written by Block rather than HybridBlock if some codes can hardly be packaged into a single Operator and asnumpy is called (sometimes because dynamic shape inference is almost impossible.). In this case if more than one gpu are used and not using mult-threading, the network can not easily be paralleled. Since dynamic network is becoming more and more popular, I think supporting multi-threading is needed. |
@kohillyang Hi! Could you please provide an example code to show how to run the operator written by numpy in parallel? Thanks! |
I see. There is only one thread to execute Custom Operator. |
I didn't modify src/operator/custom/custom-inl.h, but there can be more than one thread to excucte custom Operator. I mean, considering there are only one network, it has each individual copy on each GPU, so I think they can be treated several independent networks when forwarding. And if we have n GPUs, we execute n threads, one thread per one GPU, to inference and back-propagate these networks. Then there should be n threads to execute the custom Operator. As the GIL is freed when CPP codes are executed, and to the best of my knowns, there is no lock in mxnet in this case to force Custom Operator to be executed in only one thread, using multi-threading can speed these operators up. But I'm not sure whether mxnet forces only one thread to execute Custom Operator. |
@kohillyang |
@wkcn Here is my test codes: import os
# os.environ["MXNET_ENGINE_TYPE"]="NaiveEngine"
import mxnet as mx
import time
import threading
import numpy as np
import cv2
import os
cv2.setNumThreads(1) # Sometimes we need this to avoid deadlock, especially in multi-processing environments.
class TestOP(mx.operator.CustomOp):
def __init__(self, *args, **kwargs):
super(TestOP, self).__init__(*args, **kwargs)
print("init")
def forward(self, is_train, req, in_data, out_data, aux):
try:
x = in_data[0].asnumpy()
print("ss")
x = np.ones(shape=(1024, 1024, 300))
x_resized = cv2.resize(x, (0, 0), fx=0.5, fy=0.5)
x_resized_sum = x_resized.sum()
print('ee', x_resized_sum)
except Exception as e:
print(e)
@mx.operator.register("test_op")
class TestOPProp(mx.operator.CustomOpProp):
def __init__(self):
super(TestOPProp, self).__init__()
def list_arguments(self):
return ['x']
def list_outputs(self):
return ['y']
def infer_shape(self, in_shape):
return in_shape, in_shape
def create_operator(self, ctx, shapes, dtypes):
return TestOP()
ctx_list = [mx.gpu(x) for x in [0, 1, 2, 3]]
x_list = [mx.nd.ones(shape=(1, 2), ctx=c) for c in ctx_list]
data = mx.sym.var(name="data")
y = mx.sym.Custom(data, op_type="test_op")
y = mx.sym.identity(y, name="identity")
sym_block = mx.gluon.SymbolBlock(outputs=y, inputs=data)
sym_block.collect_params().reset_ctx(ctx_list)
def forward(x, ctx):
# print("enter", x)
re = sym_block(x)
re.wait_to_read()
# print("exit")
return re
# for x, c in zip(x_list, ctx_list):
# forward(x, c)
# mx.nd.waitall()
threads = []
for x, c in zip(x_list, ctx_list):
t = threading.Thread(target=forward, args=(x, c))
t.daemon = True
t.start()
#
for t in threads:
t.join()
mx.nd.waitall() It cashes without any Exception and outputs. |
if line |
Thanks for your report! I will check it. |
Description
(Brief description of the problem in no more than 2 sentences.)
Environment info (Required)
Package used (Python/R/Scala/Julia):
(I'm using Python)
Error Message:
Minimum reproducible example
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Forward once before starting the thread.
The text was updated successfully, but these errors were encountered: