mxnet example training cannot get expected result after 0.5.0 #5209

GeorgeXia1828 · 2017-03-02T10:19:48Z

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: Linux idc01-rank-gpu-01 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

Compiler: gcc-4.8.5

Package used (Python/R/Scala/Julia): Python

MXNet version: 0.9.4

Or if installed from source: yes

MXNet commit hash (git rev-parse HEAD):

If you are using python package, please provide

Python version and distribution: 2.7.10

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

###################################
I am trying the new mxnet to do the inception-bn-full test. I was tried successful in mxnet 0.5.0, but I found still not work in mxnet 0.9.4, even try to set fix_gamma = false.

And I found the train acc is very strange which vary unstable. (when doing test in 0.5.0 it increase very stable).

Also I also doing a simple test in mnist, I turn the mnist data to images, then processing then to rec. using alexnet to training the network. It does not work at all.

So I guess is there something wrong with the code in train_imagenet.py or etc. or sth bug exist.

The text was updated successfully, but these errors were encountered:

piiswrong · 2017-03-02T18:13:34Z

There have been some incompatible changes since 0.5. You can try models trained with the new version at data.mxnet.io

GeorgeXia1828 · 2017-03-03T03:23:50Z

@piiswrong thanks,
Before I report the problem, I have already turn into 0.9.4, and failed to do image classification tasks.
Actually, I have some tasks in image classification the scale is around ImageNet tasks, So how can I try to do the train if not turn back to 0.5.0.

GeorgeXia1828 · 2017-03-07T02:48:52Z

@piiswrong
I have been frustrated here, and I try to detect the error, I just found one significant difference in new version training process, the acc is changing unstable while in old version (0.5.0) the changing is very stable and can finally increase to a good score.
I guess something with cuda matters? or can you give some building suggestion?

szha · 2017-09-29T03:40:56Z

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!

matt32106 mentioned this issue Apr 8, 2017

[QA] why not all examples run out of the box? #5717

Open

szha closed this as completed Sep 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mxnet example training cannot get expected result after 0.5.0 #5209

mxnet example training cannot get expected result after 0.5.0 #5209

GeorgeXia1828 commented Mar 2, 2017

piiswrong commented Mar 2, 2017

GeorgeXia1828 commented Mar 3, 2017

GeorgeXia1828 commented Mar 7, 2017 •

edited

Loading

szha commented Sep 29, 2017

mxnet example training cannot get expected result after 0.5.0 #5209

mxnet example training cannot get expected result after 0.5.0 #5209

Comments

GeorgeXia1828 commented Mar 2, 2017

Environment info

Error Message:

Minimum reproducible example

piiswrong commented Mar 2, 2017

GeorgeXia1828 commented Mar 3, 2017

GeorgeXia1828 commented Mar 7, 2017 • edited Loading

szha commented Sep 29, 2017

GeorgeXia1828 commented Mar 7, 2017 •

edited

Loading