Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Rebase #13757 to master #15189

Merged
merged 6 commits into from
Jul 11, 2019
Merged

Rebase #13757 to master #15189

merged 6 commits into from
Jul 11, 2019

Conversation

ymjiang
Copy link
Contributor

@ymjiang ymjiang commented Jun 9, 2019

Description

This is a rebase version of https://github.com/apache/incubator-mxnet/pull/13757

Details

https://github.com/apache/incubator-mxnet/blob/a38278ddebfcc9459d64237086cd7977ec20c70e/example/image-classification/train_imagenet.py#L42

When I try to train imagenet with this line commented, the train-accuracy reaches 99% while the validation-accuracy is only less than 50% (single machine, 8 GPUs, global batchsize=2048, Resnet50, fp32). Absolutely this is overfitting.

Then I uncomment this line and try again with the same experiment settings. This time both train and validation accuracy converge to about 66%, which looks like normal result.

Thus, it seems that this data augmentation is pretty important for ImageNet training. Perhaps it will be better to uncomment this as default, so that future developers won't get confused by the overfitting issue.

My commits enable data-augmentation with command-line argument.

https://github.com/apache/incubator-mxnet/blob/a38278ddebfcc9459d64237086cd7977ec20c70e/example/image-classification/train_imagenet.py#L42

When I try to train imagenet with this line commented, the train-accuracy reaches 99% while the validation-accuracy is only less than 50% (single machine, 8 GPUs, global batchsize=2048, Resnet50). Absolutely this is overfitting.

Then I uncomment this line and try again with the same experiment settings. This time both train and validation accuracy converge to about 70%. 

Thus, it seems that this data augmentation is pretty important for ImageNet training. Perhaps it will be better to uncomment this as default, so that future developers won't get confused by the over-fit issue.
@ymjiang ymjiang requested a review from szha as a code owner June 9, 2019 08:33
@piyushghai
Copy link
Contributor

@ymjiang Can you make the PR title a bit more descriptive please ?
@mxnet-label-bot Add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jun 9, 2019
@Roshrini
Copy link
Member

@ymjiang Can you please retrigger CI build?

@ymjiang ymjiang closed this Jun 24, 2019
@ymjiang ymjiang reopened this Jun 24, 2019
@ymjiang
Copy link
Contributor Author

ymjiang commented Jun 24, 2019

@Roshrini Hi, I closed the issue and reopened it. Is that the correct way to re-trigger CI build?

@roywei
Copy link
Member

roywei commented Jul 8, 2019

@mxnet-label-bot add [pr-awaiting-merge]

@marcoabreu marcoabreu added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Jul 8, 2019
@wkcn wkcn merged commit 554b196 into apache:master Jul 11, 2019
@wkcn
Copy link
Member

wkcn commented Jul 11, 2019

Thanks for your contribution!

@@ -56,6 +54,8 @@ def set_imagenet_aug(aug):
dtype = 'float32'
)
args = parser.parse_args()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rearrange line 56-58? It looks like set_imagenet_aug() does nothing on args.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants