Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866) #9867

Merged
merged 4 commits into from
Feb 28, 2018

Conversation

juliusshufan
Copy link
Contributor

@juliusshufan juliusshufan commented Feb 23, 2018

Description

This PR provide a potential solution for issue #9866
For detailed information, please check the issue.

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

example/image-classification/common/fit.py

Comments

This PR has been verified on Nvidia P40 and CPU machine

@juliusshufan juliusshufan changed the title Using "uniform" Xavier strategy to initialize the weight for VGG network Using "uniform" Xavier strategy to initialize the weight for VGG network (potential solution to issue#9866) Feb 23, 2018
@juliusshufan juliusshufan changed the title Using "uniform" Xavier strategy to initialize the weight for VGG network (potential solution to issue#9866) Using "uniform" Xavier strategy to initialize the weight for VGG network (a trial solution to issue#9866) Feb 23, 2018
@juliusshufan
Copy link
Contributor Author

@szha May I have any comments on review from you or other domain owner, I understand normally it is the user to decide the weight initialization method. For this case, as the current implementation of the example explicitly uses a different initialization method for Alexnet to avoid convergence issue, it might be possible to follow similar way for VGG... What do you think?
(For description of the issue, you might move to #9867

Thanks for your time.

BR,
Shufan

@sxjscience sxjscience self-requested a review February 28, 2018 04:57
@sxjscience sxjscience merged commit 17a9c6a into apache:master Feb 28, 2018
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
…ork (a trial solution to issue#9866) (apache#9867)

* Enable the reporting of cross-entropy or nll loss value during training

* Set the default value of loss as a '' to avoid a Python runtime issue when loss argument is not set

* Applying the Xavier with "uniform" type to initialize weight when network is VGG
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
…ork (a trial solution to issue#9866) (apache#9867)

* Enable the reporting of cross-entropy or nll loss value during training

* Set the default value of loss as a '' to avoid a Python runtime issue when loss argument is not set

* Applying the Xavier with "uniform" type to initialize weight when network is VGG
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants