ConnectionResetError: [Errno 104] Connection reset by peer #125

xiaomujiang · 2021-07-24T23:59:26Z

i use yolox to train , finished 20 opeoch , it happen a error , but look the log , i do not know why , can you help me ?

2021-07-24 08:49:52.170 | INFO | yolox.core.trainer:after_iter:245 - epoch: 20/20, iter: 10/10, mem: 12556Mb, iter_time: 0.953s, data_time: 0.130s, total_loss: 6.4, iou_loss: 2.1, l1_loss: 1.3, conf_loss: 2.2, cls_loss: 0.8, lr: 6.250e-05, size: 800, ETA: 0:00:00
2021-07-24 08:49:59.031 | INFO | yolox.evaluators.voc_evaluator:evaluate_prediction:142 - Evaluate in main process...

Results computed with the unofficial Python eval code.
Results should be very close to the official MATLAB eval code.
Recompute with `./tools/reval.py --matlab ...` for your paper.
-- Thanks, The Management

Eval IoU : 0.55
Eval IoU : 0.60
Eval IoU : 0.65
Eval IoU : 0.70
Eval IoU : 0.75
Eval IoU : 0.80
Eval IoU : 0.85
Eval IoU : 0.90
Eval IoU : 0.95
2021-07-24 08:49:59.604 | INFO | yolox.core.trainer:evaluate_and_save_model:298 -
Average forward time: 21.82 ms, Average NMS time: 1.45 ms, Average inference time: 23.27 ms

map_5095: 0.3851118401510251
map_50: 0.5957307663890736

2021-07-24 08:49:59.604 | INFO | yolox.core.trainer:save_ckpt:307 - Save weights to ../models/yolox_voc_s
2021-07-24 08:50:00.101 | INFO | yolox.core.trainer:after_train:184 - Training of experiment is done and the best AP is 38.51
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

The text was updated successfully, but these errors were encountered:

lavender-ling · 2021-07-26T00:58:07Z

i have the same problem,finished 6 epoch

xiaomujiang · 2021-07-26T03:21:17Z

i have the same problem,finished 6 epoch

do you have a idea to solve this problem?

lavender-ling · 2021-07-26T05:18:37Z

i have the same problem,finished 6 epoch

do you have a idea to solve this problem?
not find

Abandon-ht · 2021-07-26T13:39:39Z

i have the same problem
AttributeError: 'Namespace' object has no attribute 'occumpy'

xiaomujiang · 2021-07-26T14:01:19Z

i have the same problem

AttributeError: 'Namespace' object has no attribute 'occumpy'

i think your problem is not the same. occumpy you can pull the new project. i remember it have updated

lavender-ling · 2021-07-27T05:32:26Z

已知的pytorch dataloader内存泄漏的bug。参见 #103 ，还在一行一行定位中。。。开大内存或者减小worker能暂时缓解

FateScript · 2021-08-06T11:43:01Z

This issue is solved by fix memory leak in #216

Joker316701882 mentioned this issue Jul 28, 2021

Fix(core): fix memory leak issue and switch to subprocess backend #216

Merged

FateScript closed this as completed Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConnectionResetError: [Errno 104] Connection reset by peer #125

ConnectionResetError: [Errno 104] Connection reset by peer #125

xiaomujiang commented Jul 24, 2021

lavender-ling commented Jul 26, 2021

xiaomujiang commented Jul 26, 2021

lavender-ling commented Jul 26, 2021

Abandon-ht commented Jul 26, 2021

xiaomujiang commented Jul 26, 2021

lavender-ling commented Jul 27, 2021

FateScript commented Aug 6, 2021

ConnectionResetError: [Errno 104] Connection reset by peer #125

ConnectionResetError: [Errno 104] Connection reset by peer #125

Comments

xiaomujiang commented Jul 24, 2021

Results computed with the unofficial Python eval code. Results should be very close to the official MATLAB eval code. Recompute with ./tools/reval.py --matlab ... for your paper. -- Thanks, The Management

map_5095: 0.3851118401510251 map_50: 0.5957307663890736

lavender-ling commented Jul 26, 2021

xiaomujiang commented Jul 26, 2021

lavender-ling commented Jul 26, 2021

Abandon-ht commented Jul 26, 2021

xiaomujiang commented Jul 26, 2021

lavender-ling commented Jul 27, 2021

FateScript commented Aug 6, 2021

Results computed with the unofficial Python eval code.
Results should be very close to the official MATLAB eval code.
Recompute with `./tools/reval.py --matlab ...` for your paper.
-- Thanks, The Management

map_5095: 0.3851118401510251
map_50: 0.5957307663890736