-
Notifications
You must be signed in to change notification settings - Fork 23.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrokenPipeError: [Errno 32] Broken pipe #2341
Comments
Would you be able to post a snippet of code that can reproduce this? |
runfile('G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py', wdir='G:/researchWork2/pytorch/triplet-network-pytorch-master') Number of params: 21840 File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 258, in File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 116, in main File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 137, in train File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 303, in iter File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 162, in init File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start File "D:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init File "D:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump BrokenPipeError: [Errno 32] Broken pipe
switch to train modetnet.train()
log avg values to somewhereplotter.plot('acc', 'train', epoch, accs.avg) Thank you so much. |
@alykhantejani |
we do not support windows officially yet. Maybe @peterjc123 knows what's wrong. |
@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your |
I can actually verify that setting the
|
@karmus89 Actually this error only occurs when you try to do multiprocessing on some code with errors in it. It's unexpected that you face with this issue when your code is right. I don't know which version you are using. Can you send a small piece of code that can reproduce your issue? |
Will do! And remember, I'm a using Windows machine. The code is directly copied from the tutorial PyTorch: Transfer Learning Tutorial. This means that the dataset has to be downloaded and extracted as instructed. The code to reproduce the error:
And I just made some PyTorch forum posts regarding this. The problem lies with Python's Edit:Here's the code that doesn't crash, which at the same time complies with Python's multiprocessing programming guidelines for Windows machines:
|
@karmus89 Well, I think I have stated it where the package was published. I'm so sad that you installed the package without reading the notice. |
@peterjc123 Please see my edited response where I did exactly that. The requirement for wrapping the code inside of Edit: Edit 2: |
A question regarding the above. I am running into the above problem within a jupyter notebook. How do you solve this in a jupyter notebook? Wrapping the code in "if name == 'main' " does not change a thing. Does someone know how to translate this to jupyter notebooks? |
@Dehde What about setting the num_worker of the DataLoader to zero? |
@peterjc123 |
Could you show me the minimal code so that I could reproduce? |
@peterjc123 As promised, the code I use: `
The WaterbodyDataset inherits from the pytorch dataset class. |
I also got the same error. When I set num_workers to 0, the error does not appear again. However, when I set num_workers to 1, the error is still there. |
When I set num_workers to 0, there is no error. |
Please i need assistance with this error "BrokenPipeError: [Errno 32] Broken pipe" |
|
I found that the issue is still present, but only when I use a custom |
For me, just changing num_workers from 2 to 0 made the code work properly... |
Had same issue when I ran the PyTorch Data Loading and Processing Tutorial. Changing num_workers from 2 to 0 solved the problem, but num_workers = 2 worked fine with other datasets.. I use Windows |
num_workers > 0 doesn't work for windows. |
I met this same error. And when I try to find method to solve this problem, the program continues to run automatically (wait about 10 minutes ) amazing 😕 |
I've run the exact same code multiple times with different results. Also, I've copied code that causes a broken pipe to a new file (the contents being exactly the same) and it would run fine. I think there's an external factor in play here. I can't reproduce the bug anymore, but maybe try deleting your |
have some problem on Windows10. dunno why but i think problem is dataloader (num_workers to 0 doesn't help) and multiprocessing |
After using Ubuntu for quire some time, I am trying Windows-10 lately (just for prototyping before using the cluster machine) and bumped into the same error, setting num_workers to 0 helped. Make sure you are setting all dataloaders, train, test, and validate. |
I also have same problem on Win10. I got the error message '[Errno 32] Broken pipe' when I set the num_workers greater than 0. I guess that is a bug for Win10, and I am looking forward to see a fixed version on next release. |
same error, num_workers=0 worked, but I want multiprocessing to speed up dataloading. |
Seems that the only way for this to work is using Linux, I am using Windows-10 for prototyping and then pushing everything to the cluster which is based on Linux.
|
I also encountered a similar problem in windows 10 when defining my custom torchvision dataset and trying to run it in jupyter lab. Apparently the custom dataset does not get registered as an attribute to the main module which is called by the DataLoader in the multiprocessing.py\spawn.py file. I fixed it by writing the dataset into a module and then importing it as mentioned here: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
|
Setting num_workers to 0 worked for me. Could you explain why is this causing an error? |
I have noticed this issue is closed, but I do not think this is fixed. Is there any effort to fix multi-processing dataloader on windows? Currently there are 2 options as far as I know:
So the first one is an imperfect fix, while the second one amounts to just giving up. Is there any effort on fixing multi-processed dataloading on windows currently going on somewhere else or should we re-open this one? |
Use |
I got problem when trying to train on my custom Coco dataset (which is little bit difference from default CocoDetection Pytorch class). Add params collate_fn=utils.collate_fn worked for me: |
If anyone runs into this issue and none of the above works, my problem ended up being that my file name had "-" in it, as opposed to, say, "_", and multiprocessing was unable to resolve the references as a result. |
you must put all code for train into if name=='main' |
another thing is that, at least in my experience with using detectron2, the number of workers has to be <= your cpu cores, unlike with linux. so if you have 12 cpu cores like I do, u can't use more than 12 workers (not that that would be that beneficial to begin with, i suppose). and with detectron2 in particular, if you use an evaluator this will then double the amount of workers as it creates N additional workers (N being num_workers) for evaluation, while the other workers are not terminated. so with 12 core cpu you can actually only have 6 workers |
Hi, I use Pytorch to run a triplet network(GPU), but when I got data , there was always a BrokenPipeError:[Errno 32] Broken pipe.
I thought it was something wrong in the following codes:
for batch_idx, (data1, data2, data3) in enumerate(test_loader):
if args.cuda:
data1, data2, data3 = data1.cuda(), data2.cuda(), data3.cuda()
data1, data2, data3 = Variable(data1), Variable(data2), Variable(data3)
Can you give me some suggestions? Thank you so much.
The text was updated successfully, but these errors were encountered: