-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConnectionResetError: [Errno 104] Connection reset by peer #125
Comments
i have the same problem,finished 6 epoch |
do you have a idea to solve this problem? |
|
i have the same problem |
i think your problem is not the same. occumpy you can pull the new project. i remember it have updated |
已知的pytorch dataloader内存泄漏的bug。 参见 #103 ,还在一行一行定位中。。。开大内存或者减小worker能暂时缓解 |
This issue is solved by fix memory leak in #216 |
i use yolox to train , finished 20 opeoch , it happen a error , but look the log , i do not know why , can you help me ?
2021-07-24 08:49:52.170 | INFO | yolox.core.trainer:after_iter:245 - epoch: 20/20, iter: 10/10, mem: 12556Mb, iter_time: 0.953s, data_time: 0.130s, total_loss: 6.4, iou_loss: 2.1, l1_loss: 1.3, conf_loss: 2.2, cls_loss: 0.8, lr: 6.250e-05, size: 800, ETA: 0:00:00
2021-07-24 08:49:59.031 | INFO | yolox.evaluators.voc_evaluator:evaluate_prediction:142 - Evaluate in main process...
Results computed with the unofficial Python eval code.
Results should be very close to the official MATLAB eval code.
Recompute with
./tools/reval.py --matlab ...
for your paper.-- Thanks, The Management
Eval IoU : 0.55
Eval IoU : 0.60
Eval IoU : 0.65
Eval IoU : 0.70
Eval IoU : 0.75
Eval IoU : 0.80
Eval IoU : 0.85
Eval IoU : 0.90
Eval IoU : 0.95
2021-07-24 08:49:59.604 | INFO | yolox.core.trainer:evaluate_and_save_model:298 -
Average forward time: 21.82 ms, Average NMS time: 1.45 ms, Average inference time: 23.27 ms
map_5095: 0.3851118401510251
map_50: 0.5957307663890736
2021-07-24 08:49:59.604 | INFO | yolox.core.trainer:save_ckpt:307 - Save weights to ../models/yolox_voc_s
2021-07-24 08:50:00.101 | INFO | yolox.core.trainer:after_train:184 - Training of experiment is done and the best AP is 38.51
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
The text was updated successfully, but these errors were encountered: