Fix image classification scripts and Improve Fp16 tutorial #11533

rahul003 · 2018-07-02T22:04:35Z

Description

Improve the float16 tutorial by adding a point of comparision with larger batch size.
Fix a typo in the benchmark_score script, an issue with the gluon example
Update fine-tune and train_cifar10.py scripts after recent augmentation PR changed the way the augmentations are applied in that script. Refer Add standard ResNet data augmentation for ImageRecordIter #11027 (comment)
Set random_mirror=1 in default resnet aug

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

xinyu-intel · 2018-07-05T04:17:45Z

@szha @rahul003 Please take a review and this can be a hotfix for benchmark_score.py. Thanks!

eric-haibin-lin · 2018-07-10T00:59:30Z

docs/faq/float16.md

+1024 | float16 | 76.34% | 7.3 hrs | 1.62x |
+2048 | float16 | 76.29% | 6.5 hrs | 1.82x |
+
+![Training curves of Resnet50 v1 on Imagenet 2012](https://github.com/rahul003/web-data/blob/d415abf4a1c6df007483169c81807c250135f9a5/mxnet/tutorials/mixed-precision/resnet50v1b_imagenet_fp16_fp32_training.png?raw=true)


Can we not use personal repo for images?

Unfortunately I have a hard time getting anything merged into DMLC repos. Looks like there have been more than 25 commits since I opened this PR 4 weeks back there dmlc/web-data#79 but this is still unnoticed!

eric-haibin-lin · 2018-07-10T00:59:37Z

docs/faq/float16.md

@@ -102,9 +102,17 @@ python fine-tune.py --network resnet --num-layers 50 --pretrained-model imagenet
 ```

 ## Example training results
-Here is a plot to compare the training curves of a Resnet50 v1 network on the Imagenet 2012 dataset. These training jobs ran for 95 epochs with a batch size of 1024 using a learning rate of 0.4 decayed by a factor of 1 at epochs 30,60,90 and used Gluon. The only changes made for the float16 job when compared to the float32 job were that the network and data were cast to float16, and the multi-precision mode was used for optimizer. The final accuracies at 95th epoch were **76.598% for float16** and **76.486% for float32**. The difference is within what's normal random variation, and there is no reason to expect float16 to have better accuracy than float32 in general. This run was approximately **65% faster** to train with float16.
+Let us consider training a Resnet50 v1 model on the Imagenet 2012 dataset. For this model, the GPU memory usage is close to the capacity of V100 GPU with a batch size of 128 when using float32. Using float16 allows the use of 256 batch size. Shared below are results using 8 V100 GPUs. Let us compare the three scenarios that arise here: float32 with 1024 batch size, float16 with 1024 batch size and float16 with 2048 batch size. These jobs trained for 90 epochs using a learning rate of 0.4 for 1024 batch size and 0.8 for 2048 batch size. This learning rate was decayed by a factor of 0.1 at the 30th, 60th and 80th epochs. The only changes made for the float16 jobs when compared to the float32 job were that the network and data were cast to float16, and the multi-precision mode was used for optimizer. The final accuracy at 90th epoch and the time to train are tabulated below for these three scenarios. The top-1 validation errors at the end of each epoch are also plotted below.


It's better to be specific on the overall hardware setup (it's not done on DGX).

Shared below are results using 8 V100 GPUs ->
Shared below are results using 8 V100 GPUs on AWS p3.16xlarge instance.

eric-haibin-lin · 2018-07-10T01:02:25Z

example/gluon/data.py

-        transposed = nd.transpose(cropped, (2, 0, 1))
-        image = mx.nd.cast(image, dtype)
-        return image, label
+        transposed = mx.nd.transpose(cropped, (2, 0, 1))


is dtype casting no longer necessary?

Not in this script as it does astype() later. And that will also be more general for any dataset iterator

…6-tut

rahul003 · 2018-07-19T07:05:36Z

@hetong007 please confirm the resnet-aug change to turn on random_mirror.

rahul003 · 2018-07-19T07:05:57Z

@eric-haibin-lin The training curve image is now in dmlc repository.

hetong007 · 2018-07-19T07:41:38Z

For cifar training, the standard augmentation for benchmark is:

mean_rgb = [125.307, 122.961, 113.8575]
std_rgb = [51.5865, 50.847, 51.255]
train_data = mx.io.ImageRecordIter(
    path_imgrec         = rec_train,
    path_imgidx         = rec_train_idx,
    preprocess_threads  = num_workers,
    shuffle             = True,
    batch_size          = batch_size,
    
    data_shape          = (3, 32, 32),
    mean_r              = mean_rgb[0],
    mean_g              = mean_rgb[1],
    mean_b              = mean_rgb[2],
    std_r               = std_rgb[0],
    std_g               = std_rgb[1],
    std_b               = std_rgb[2],
    rand_mirror         = True,
    pad                 = 4,
    fill_value          = 0,
    rand_crop           = True,
    max_crop_size       = 32,
    min_crop_size       = 32,
)
val_data = mx.io.ImageRecordIter(
    path_imgrec         = rec_val,
    path_imgidx         = rec_val_idx,
    preprocess_threads  = num_workers,
    shuffle             = False,
    batch_size          = batch_size,
    
    data_shape          = (3, 32, 32),
    mean_r              = mean_rgb[0],
    mean_g              = mean_rgb[1],
    mean_b              = mean_rgb[2],
    std_r               = std_rgb[0],
    std_g               = std_rgb[1],
    std_b               = std_rgb[2],
)

Can you change accordingly?

rahul003 · 2018-07-23T05:59:13Z

So @hetong007 should the set_resnet_aug function be set_imagenet_aug or set_resnet_imagenet_aug?

rahul003 · 2018-07-23T06:12:24Z

Are the above augmentations used for all Cifar models?

hetong007 · 2018-07-23T06:18:30Z

@rahul003 I think it is more appropriate to call set_imagenet_aug as it applies to a bunch of popular models.

And yes the above params are standard for cifar model training & performance comparison.

hetong007 · 2018-07-23T06:41:11Z

example/image-classification/train_cifar10.py

+    aug.set_defaults(random_mirror=1, pad=4, fill_value=0, random_crop=1)
+    aug.set_defaults(min_random_size=32, max_random_size=32)
+
+


a few blank lines (also in train_imagenet.py:L26-27), otherwise lgtm.

rahul003 · 2018-07-27T04:58:05Z

@eric-haibin-lin @hetong007 Could you merge this?

* Replace cublassgemm with cublassgemmex for >= 7.5 * Add comment for cublassgemmex Remove fixed seed for test_sparse_nd_save_load (apache#11920) * Remove fixed seed for test_sparse_nd_save_load * Add comments related to the commit Corrections to profiling tutorial (apache#11887) Corrected a race condition with stopping profiling. Added mx.nd.waitall to ensure all operations have completed, including GPU operations that might otherwise be missing. Also added alternative code for context selection GPU vs CPU, that had error before on machines with nvidia-smi. Fix image classification scripts and Improve Fp16 tutorial (apache#11533) * fix bugs and improve tutorial * improve logging * update benchmark_score * Update float16.md * update link to dmlc web data * fix train cifar and add random mirroring * set aug defaults * fix whitespace * fix typo

) * fix bugs and improve tutorial * improve logging * update benchmark_score * Update float16.md * update link to dmlc web data * fix train cifar and add random mirroring * set aug defaults * fix whitespace * fix typo

rahul003 added 2 commits July 2, 2018 21:26

fix bugs and improve tutorial

cf27c68

improve logging

d02b988

rahul003 requested a review from szha as a code owner July 2, 2018 22:04

rahul003 mentioned this pull request Jul 2, 2018

[MXNET-139] Tutorial for mixed precision training with float16 #10391

Merged

8 tasks

update benchmark_score

086327b

rahul003 changed the title ~~Improve Fp16 tutorial and fix issues in scripts~~ Fix image classification scripts and Improve Fp16 tutorial Jul 3, 2018

eric-haibin-lin reviewed Jul 10, 2018

View reviewed changes

rahul003 added 5 commits July 10, 2018 13:23

Update float16.md

6ae11b8

update link to dmlc web data

fa08b87

Merge branch 'master' into fp16-tut

3d4eb33

fix train cifar and add random mirroring

a151d57

Merge branch 'fp16-tut' of https://github.com/rahul003/mxnet into fp1…

419053c

…6-tut

rahul003 mentioned this pull request Jul 19, 2018

Add standard ResNet data augmentation for ImageRecordIter #11027

Merged

4 tasks

set aug defaults

81dca54

hetong007 reviewed Jul 23, 2018

View reviewed changes

rahul003 added 2 commits July 23, 2018 00:26

fix whitespace

5dd8c2a

fix typo

953709e

rahul003 requested a review from anirudh2290 as a code owner July 23, 2018 18:01

hetong007 approved these changes Jul 27, 2018

View reviewed changes

indhub merged commit 54ebc5d into apache:master Jul 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix image classification scripts and Improve Fp16 tutorial #11533

Fix image classification scripts and Improve Fp16 tutorial #11533

rahul003 commented Jul 2, 2018 •

edited

Loading

xinyu-intel commented Jul 5, 2018

eric-haibin-lin Jul 10, 2018

rahul003 Jul 10, 2018

eric-haibin-lin Jul 10, 2018

rahul003 Jul 10, 2018

eric-haibin-lin Jul 10, 2018

rahul003 Jul 10, 2018

rahul003 commented Jul 19, 2018

rahul003 commented Jul 19, 2018

hetong007 commented Jul 19, 2018

rahul003 commented Jul 23, 2018

rahul003 commented Jul 23, 2018

hetong007 commented Jul 23, 2018

hetong007 Jul 23, 2018

rahul003 commented Jul 27, 2018

		aug.set_defaults(random_mirror=1, pad=4, fill_value=0, random_crop=1)
		aug.set_defaults(min_random_size=32, max_random_size=32)

Fix image classification scripts and Improve Fp16 tutorial #11533

Fix image classification scripts and Improve Fp16 tutorial #11533

Conversation

rahul003 commented Jul 2, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

xinyu-intel commented Jul 5, 2018

eric-haibin-lin Jul 10, 2018

Choose a reason for hiding this comment

rahul003 Jul 10, 2018

Choose a reason for hiding this comment

eric-haibin-lin Jul 10, 2018

Choose a reason for hiding this comment

rahul003 Jul 10, 2018

Choose a reason for hiding this comment

eric-haibin-lin Jul 10, 2018

Choose a reason for hiding this comment

rahul003 Jul 10, 2018

Choose a reason for hiding this comment

rahul003 commented Jul 19, 2018

rahul003 commented Jul 19, 2018

hetong007 commented Jul 19, 2018

rahul003 commented Jul 23, 2018

rahul003 commented Jul 23, 2018

hetong007 commented Jul 23, 2018

hetong007 Jul 23, 2018

Choose a reason for hiding this comment

rahul003 commented Jul 27, 2018

rahul003 commented Jul 2, 2018 •

edited

Loading