Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mxnet example training cannot get expected result after 0.5.0 #5209

Closed
GeorgeXia1828 opened this issue Mar 2, 2017 · 4 comments
Closed

mxnet example training cannot get expected result after 0.5.0 #5209

GeorgeXia1828 opened this issue Mar 2, 2017 · 4 comments

Comments

@GeorgeXia1828
Copy link

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: Linux idc01-rank-gpu-01 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

Compiler: gcc-4.8.5

Package used (Python/R/Scala/Julia): Python

MXNet version: 0.9.4

Or if installed from source: yes

MXNet commit hash (git rev-parse HEAD):

If you are using python package, please provide

Python version and distribution: 2.7.10

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

###################################
I am trying the new mxnet to do the inception-bn-full test. I was tried successful in mxnet 0.5.0, but I found still not work in mxnet 0.9.4, even try to set fix_gamma = false.

And I found the train acc is very strange which vary unstable. (when doing test in 0.5.0 it increase very stable).

Also I also doing a simple test in mnist, I turn the mnist data to images, then processing then to rec. using alexnet to training the network. It does not work at all.

So I guess is there something wrong with the code in train_imagenet.py or etc. or sth bug exist.

@piiswrong
Copy link
Contributor

There have been some incompatible changes since 0.5. You can try models trained with the new version at data.mxnet.io

@GeorgeXia1828
Copy link
Author

@piiswrong thanks,
Before I report the problem, I have already turn into 0.9.4, and failed to do image classification tasks.
Actually, I have some tasks in image classification the scale is around ImageNet tasks, So how can I try to do the train if not turn back to 0.5.0.

@GeorgeXia1828
Copy link
Author

GeorgeXia1828 commented Mar 7, 2017

@piiswrong
I have been frustrated here, and I try to detect the error, I just found one significant difference in new version training process, the acc is changing unstable while in old version (0.5.0) the changing is very stable and can finally increase to a good score.
I guess something with cuda matters? or can you give some building suggestion?

@szha
Copy link
Member

szha commented Sep 29, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!

@szha szha closed this as completed Sep 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants