You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loss is Nan in training based on [openpose] + [VGG19] after 6800 iterations
nan in training based on [openpose] + [Resnet18] after 74300 iterations, as blow:
Hello! @zpphigh@lengyuner
The Nan in training is due to the model parameter initialization. A not so good parameter initialization will lead to the divergence during training.
For [openpose] + [VGG19]:
Use the pretrained vgg-19 backbone at here and put it at the save_dir/pretrain_backbone to make sure hyperpose could load the pretrain backbone successfully during training.
For [openpose] + [Resnet 18]:
Sorry, I haven't try this setting for training before and thus I currently don't have the pretraining [Resnet 18] backbone. If time permits, pretrained [Resnet18] backbone will be released.
Loss is Nan in training based on [openpose] + [VGG19] after 6800 iterations
nan in training based on [openpose] + [Resnet18] after 74300 iterations, as blow:
Train iteration 74300 / 1000000: Learning rate 9.999999747378752e-05 total_loss:51.99763870239258, conf_loss:25.99869155883789, paf_loss:74.33314514160156, l2_loss 1.8317127227783203 stage_num:6 time:0.0002014636993408203
stage_0 conf_loss:27.358610153198242 paf_loss:78.49413299560547
stage_1 conf_loss:26.15711784362793 paf_loss:74.4624252319336
stage_2 conf_loss:25.707374572753906 paf_loss:73.61676788330078
stage_3 conf_loss:25.627124786376953 paf_loss:73.40571594238281
stage_4 conf_loss:25.598114013671875 paf_loss:73.17063903808594
stage_5 conf_loss:25.543825149536133 paf_loss:72.84919738769531
Train iteration 74400 / 1000000: Learning rate 9.999999747378752e-05 total_loss:1148700065792.0, conf_loss:1743442149376.0, paf_loss:553957654528.0, l2_loss 1.8390066623687744 stage_num:6 time:0.00020194053649902344
stage_0 conf_loss:57.1938591003418 paf_loss:109.35887145996094
stage_1 conf_loss:1848.776611328125 paf_loss:3432.841552734375
stage_2 conf_loss:17413386.0 paf_loss:1569305.875
stage_3 conf_loss:135925504.0 paf_loss:4067570176.0
stage_4 conf_loss:2292520058880.0 paf_loss:300717211648.0
stage_5 conf_loss:8167979745280.0 paf_loss:3018960142336.0
Train iteration 74500 / 1000000: Learning rate 9.999999747378752e-05 total_loss:nan, conf_loss:nan, paf_loss:nan, l2_loss nan stage_num:6 time:0.0002295970916748047
stage_0 conf_loss:nan paf_loss:nan
stage_1 conf_loss:13212.9375 paf_loss:nan
stage_2 conf_loss:40226508.0 paf_loss:nan
stage_3 conf_loss:2792057995264.0 paf_loss:nan
stage_4 conf_loss:nan paf_loss:nan
stage_5 conf_loss:nan paf_loss:2.488148857507021e+16
The text was updated successfully, but these errors were encountered: