-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-training SSD model on Windows #664
Comments
This hasn't been tested or supported on Windows. Have you tried training it on your Jetson? |
I'm not with my Jetson right now, so I don't know. I thought it'd be better to train on my computer because it trains quicker. I'll try training it on my Jetson when I have it. In the meantime, if you could figure out the problem, that'd be really nice! Thank you! |
I train this on my Linux laptop as well (Ubuntu 16.04/18.04) without issue - it seems an error related to Windows. See this related post - qfgaohao/pytorch-ssd#71 (comment) |
Adding --num-workers=0 made it worked although it does still show some warnings. But hey, it's actually training! C:\Users-----\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\optim\lr_scheduler.py:123: UserWarning: Detected call of |
I'm not sure this is helpful, but in my case (I also use windows10), multiprocessing made an error like you when I used this pytorch-ssd. |
it's work, my lists of torch 1.7.0, cuda 10.1 in win10 with 1070 laptop, thank you! |
Both solutions worked for my enviroment(windows 10)! Thanks. @AiueoABC @gapro20022 |
Alternatively, in case you don't want the dill dependency and still profit from multithreading, replacing the lambda function in TrainAugmentation with the following worked for me: class ScaleByStd:
|
maybe some case can try replace workers=4 to be --num-workers=0 |
I followed the tutorial and downloaded the model .pth and requirements.txt. However, the command prompt returns errors as I try to train the model with the dataset I picked using the downloader. Can you help me with this?
2020-07-30 23:30:44 - Using CUDA...
2020-07-30 23:30:44 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/XBIN', dataset_type='open_images', datasets=['data'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=30, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2020-07-30 23:30:44 - Prepare training datasets.
2020-07-30 23:30:44 - loading annotations from: data/sub-train-annotations-bbox.csv
2020-07-30 23:30:44 - annotations loaded from: data/sub-train-annotations-bbox.csv
num images: 238
2020-07-30 23:30:44 - Dataset Summary:Number of Images: 238
Minimum Number of Images for a Class: -1
Label Distribution:
Bottle: 732
Box: 62
Drink: 270
Drinking straw: 3
Plastic bag: 17
Tin can: 24
2020-07-30 23:30:44 - Stored labels into file models/XBIN\labels.txt.
2020-07-30 23:30:44 - Train dataset size: 238
2020-07-30 23:30:44 - Prepare Validation datasets.
2020-07-30 23:30:44 - loading annotations from: data/sub-test-annotations-bbox.csv
2020-07-30 23:30:44 - annotations loaded from: data/sub-test-annotations-bbox.csv
num images: 1589
2020-07-30 23:30:46 - Dataset Summary:Number of Images: 1589
Minimum Number of Images for a Class: -1
Label Distribution:
Bottle: 957
Box: 252
Drink: 1408
Drinking straw: 17
Plastic bag: 19
Tin can: 183
2020-07-30 23:30:46 - Validation dataset size: 1589
2020-07-30 23:30:46 - Build network.
2020-07-30 23:30:46 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2020-07-30 23:30:46 - Took 0.07 seconds to load the model.
2020-07-30 23:30:48 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2020-07-30 23:30:48 - Uses CosineAnnealingLR scheduler.
2020-07-30 23:30:48 - Start training from epoch 0.
C:\Users------\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\optim\lr_scheduler.py:123: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "train_ssd.py", line 343, in
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 113, in train
for i, data in enumerate(loader):
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainAugmentation.init..'
C:\Users-----\Desktop\jetson-inference-master\python\training\detection\ssd>2020-07-30 23:30:50 - Using CUDA...
Traceback (most recent call last):
File "", line 1, in
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users------\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
The text was updated successfully, but these errors were encountered: