Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Set ImageNet data augmentation by default #13757

Closed
wants to merge 3 commits into from
Closed

Conversation

ymjiang
Copy link
Contributor

@ymjiang ymjiang commented Jan 2, 2019

https://github.com/apache/incubator-mxnet/blob/a38278ddebfcc9459d64237086cd7977ec20c70e/example/image-classification/train_imagenet.py#L42

When I try to train imagenet with this line commented, the train-accuracy reaches 99% while the validation-accuracy is only less than 50% (single machine, 8 GPUs, global batchsize=2048, Resnet50, fp32). Absolutely this is overfitting.

Then I uncomment this line and try again with the same experiment settings. This time both train and validation accuracy converge to about 66%, which looks like normal result.

Thus, it seems that this data augmentation is pretty important for ImageNet training. Perhaps it will be better to uncomment this as default, so that future developers won't get confused by the overfitting issue.

https://github.com/apache/incubator-mxnet/blob/a38278ddebfcc9459d64237086cd7977ec20c70e/example/image-classification/train_imagenet.py#L42

When I try to train imagenet with this line commented, the train-accuracy reaches 99% while the validation-accuracy is only less than 50% (single machine, 8 GPUs, global batchsize=2048, Resnet50). Absolutely this is overfitting.

Then I uncomment this line and try again with the same experiment settings. This time both train and validation accuracy converge to about 70%. 

Thus, it seems that this data augmentation is pretty important for ImageNet training. Perhaps it will be better to uncomment this as default, so that future developers won't get confused by the over-fit issue.
@ymjiang ymjiang requested a review from szha as a code owner January 2, 2019 08:45
@Roshrini
Copy link
Member

Roshrini commented Jan 2, 2019

@sandeep-krishnamurthy @eric-haibin-lin Can you take a look?

@mxnet-label-bot Add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jan 2, 2019
@vishaalkapoor
Copy link
Contributor

vishaalkapoor commented Jan 7, 2019

I'm unsure why image net arguments are not the default for a training script for image net and would be curious to know why not, but there are two better approaches to this depending on what is determined.

If ImageNet arguments are to be the default, they should be merged into the stanza:

    parser.set_defaults(
        # network
        network          = 'resnet',
        num_layers       = 50,
        # data
        num_classes      = 1000,
        num_examples     = 1281167,
        image_shape      = '3,224,224',
        min_random_scale = 1, # if input image has min size k, suggest to use
                              # 256.0/x, e.g. 0.533 for 480
        # train
        num_epochs       = 80,
        lr_step_epochs   = '30,60',
        dtype            = 'float32'
    )

If they are not the default, it would be cleaner to add an argument --override-with-image-net-augmentations or something more appropriately named that would override parameters with those in the method.

Vishaal

@stu1130
Copy link
Contributor

stu1130 commented Jan 16, 2019

@rahul003 could you take a look at it. any idea why it was commented? Thanks a lot!

@sandeep-krishnamurthy
Copy link
Contributor

@ymjiang - Thanks for your contributions. Did you get a chance to look at @vishaalkapoor comment?

@vandanavk
Copy link
Contributor

@mxnet-label-bot update [pr-awaiting-response]

@marcoabreu marcoabreu added pr-awaiting-response PR is reviewed and waiting for contributor to respond and removed pr-awaiting-review PR is waiting for code review labels Feb 5, 2019
@ymjiang
Copy link
Contributor Author

ymjiang commented Feb 11, 2019

Hi @sandeep-krishnamurthy , I agree with @vishaalkapoor that the parameter argument should be set as default. But I see the parameter is already provided in set_imagenet_aug. Perhaps one neat way would be to directly set it as uncommented? No other change will be involved.

@ankkhedia
Copy link
Contributor

@vishaalkapoor Could you suggest a way forward on this PR?

@@ -39,7 +39,7 @@ def set_imagenet_aug(aug):
data.add_data_args(parser)
data.add_data_aug_args(parser)
# uncomment to set standard augmentations for imagenet training
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment should change accordingly

@anirudhacharya
Copy link
Member

@ymjiang can you please set a command line argument to either override or keep set_imagenet_aug line. I think that is what @vishaalkapoor was suggesting.

@karan6181
Copy link
Contributor

@ymjiang Could you please address the review comments made by @anirudhacharya. It seems no updates since last 2 weeks. Thanks!

@ymjiang
Copy link
Contributor Author

ymjiang commented Mar 19, 2019

@karan6181 @anirudhacharya Sorry for the delay. I committed two new changes to enable data-augmentation with command-line argument. Please review and see if they are appropriate.

@piyushghai
Copy link
Contributor

@anirudhacharya Ping for review.
@ymjiang Can you look into the CI failures ?

@Roshrini
Copy link
Member

@vishaalkapoor Can you take a look at this PR again?

@roywei
Copy link
Member

roywei commented Apr 30, 2019

@ymjiang Hi, could you rebase to latest master? it should resolve the failing CI test

Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pinaraws
Copy link

@ymjiang Hi, could you rebase to latest master? it should resolve the failing CI test

@piyushghai
Copy link
Contributor

@ymjiang Gentle ping...

@ymjiang
Copy link
Contributor Author

ymjiang commented Jun 9, 2019

Rebased to master now: https://github.com/apache/incubator-mxnet/pull/15189. Will close this issue.

@ymjiang ymjiang closed this Jun 9, 2019
wkcn pushed a commit that referenced this pull request Jul 11, 2019
* Update .gitmodules

* Set ImageNet data augmentation by default

https://github.com/apache/incubator-mxnet/blob/a38278ddebfcc9459d64237086cd7977ec20c70e/example/image-classification/train_imagenet.py#L42

When I try to train imagenet with this line commented, the train-accuracy reaches 99% while the validation-accuracy is only less than 50% (single machine, 8 GPUs, global batchsize=2048, Resnet50). Absolutely this is overfitting.

Then I uncomment this line and try again with the same experiment settings. This time both train and validation accuracy converge to about 70%. 

Thus, it seems that this data augmentation is pretty important for ImageNet training. Perhaps it will be better to uncomment this as default, so that future developers won't get confused by the over-fit issue.

* Add argument for imagenet data augmentation

* Enable data-aug with argument

* Update .gitmodules
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-response PR is reviewed and waiting for contributor to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.