get NAN loss #9

renqianluo · 2021-03-15T05:21:26Z

Hi,
I use the train_imagenet.py to train the searched architecture following the guidance in README.
I train the code on 4 GPUs. All the hyperparameters follow the guidance in README with --num-gpu=4
But I got 'Loss NAN' at the very beginning of the training.

vinh-cao · 2022-09-21T08:36:12Z

same here. But I used the evaluator script to test a subnet of MobileNet V3

cifar10 Train Epoch #1: 100%|█| 391/391 [01:01<00:00, 6.38it/s, loss=nan, top1=
Validate Epoch #1 : 100%|█| 50/50 [00:07<00:00, 6.72it/s, loss=nan, top1=10, to

Edit:
looks like the loss is far too high at the beginning and learn rate is set to 0.01:
cifar10 Train Epoch #1: 0%| | 1/391 [04:22<28:26:08, 262.48s/it, loss=2.71e+8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get NAN loss #9

get NAN loss #9

renqianluo commented Mar 15, 2021

vinh-cao commented Sep 21, 2022 •

edited

Loading

get NAN loss #9

get NAN loss #9

Comments

renqianluo commented Mar 15, 2021

vinh-cao commented Sep 21, 2022 • edited Loading

vinh-cao commented Sep 21, 2022 •

edited

Loading