[MXNet-1375][Fit API]Added RNN integration test for fit() API #14547

karan6181 · 2019-03-27T22:59:50Z

Description

This PR adds nightly RNN Integration test using Gluon fit() API.
JIRA epic can be found here: https://issues.apache.org/jira/browse/MXNET-1375

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with MXNET-1375,
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

test_sentiment_rnn: Test the RNN model on CPU using Synthetic data and on GPU using IMDB dataset.

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

@roywei @nswamy @abhinavs95

piyushghai · 2019-03-27T23:01:09Z

tests/nightly/JenkinsfileForBinaries

@@ -106,6 +106,22 @@ core_logic: {
          utils.docker_run('ubuntu_nightly_gpu', 'nightly_tutorial_test_ubuntu_python3_gpu', true, '1500m')
        }
      }
+    },


This should not be in the JenkinsfileForBinaries.

Done ae81b08. Moved to tests/nightly/Jenkinsfile. Thank you for your input.

piyushghai · 2019-03-27T23:05:19Z

tests/nightly/estimator/test_sentiment_rnn.py

+    '''
+    Download and extract the IMDB dataset
+    '''
+    url = ('http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz')


Should we keep this data on an S3 bucket instead for more reliability of it being downloaded ?

+1 to comment by @piyushghai. Or maybe look if its there in gluon-nlp?

We have to handle the licensing part if we add it to our S3 bucket. Let's keep it this way for now. I also found this R example example/rnn/bucket_R/data_preprocessing_seq_to_one.R which also uses the data from the same source.

piyushghai · 2019-03-27T23:06:07Z

tests/nightly/estimator/test_sentiment_rnn.py

+        return outputs
+
+
+def download_imdb(data_dir='./data'):


Can you keep this path as /tmp/data ?

Done. Commit ae81b08

piyushghai · 2019-03-27T23:12:08Z

tests/nightly/estimator/test_sentiment_rnn.py

+    embed_size = 100
+
+    # Set context
+    if mx.context.num_gpus() > 0:


Can write this more succinctly as ctx = mx.gpu() if mx.context.num_gpus() > 0 else mx.cpu()

Yes absolutely. Modified it. ae81b08

piyushghai · 2019-03-27T23:15:11Z

tests/nightly/estimator/test_sentiment_rnn.py

+
+
+def test_estimator_gpu():
+    '''


There are a lot of things in common between test_estimator_cpu and test_estimator_gpu methods.
Both these methods can be clubbed into a single one, with just the context being different.

I see in test_estimator_gpu you are using the assertion as well, so maybe your (new) method : test_esimator (which is same for cpu/gpu) can return an estimator, and you can do your assertions only if being run on gpu context.

Modified a code to reduce the redundancy. Could you please review it again? Thanks!

abhinavs95 · 2019-03-27T23:16:32Z

@mxnet-label-bot add [Gluon, Test]

roywei

Thanks for the contribution, few comments, and is there any reason to not use nostetest in integration tests? I see we are running a python script here.
cc @nswamy what do you think?

roywei · 2019-03-28T16:30:08Z

tests/nightly/estimator/test_sentiment_rnn.py

+import mxnet as mx
+from mxnet import nd
+from mxnet.contrib import text
+from mxnet.gluon import data as gdata, loss as gloss, utils as gutils, nn, rnn


just import data, loss, utils, don't rename it. the book did it for separation of mxnet and gluoncv/nlp, d2l package, we only have mxnet here

Thanks. Updated it. ae81b08

roywei · 2019-03-28T16:39:39Z

tests/nightly/estimator/test_sentiment_rnn.py

+    assert e.train_stats['train_' + acc.name][num_epochs - 1] > 0.70
+
+
+if __name__ == '__main__':


you don't need main in integration tests

Updated. ae81b08

roywei · 2019-03-28T16:42:19Z

ci/docker/runtime_functions.sh

+    set -ex
+    cd /work/mxnet/tests/nightly/estimator
+    export PYTHONPATH=/work/mxnet/python/
+    python test_sentiment_rnn.py --type gpu


use nosetest and assert accuracy for gpu, do not run python scripts here

abhinavs95 · 2019-03-28T17:43:59Z

tests/nightly/estimator/test_sentiment_rnn.py

+    '''
+    Download and extract the IMDB dataset
+    '''
+    url = ('http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz')


+1 to comment by @piyushghai. Or maybe look if its there in gluon-nlp?

abhinavs95 · 2019-03-28T17:51:23Z

tests/nightly/estimator/test_sentiment_rnn.py

+
+    # Get the model
+    for model in models:
+        if model == 'TextCNN':


To avoid splitting logic here we can have a get_model() function which returns the net and related items, and use it for both the tests

We have a separate model class for both the model and we are instantiating the object based on the selection. The cpu test runs for both models, however, gpu test runs only BiRNN.

abhinavs95 · 2019-03-28T17:54:50Z

tests/nightly/estimator/test_sentiment_rnn.py

+        return outputs
+
+
+def download_imdb(data_dir='./data'):


There are 5 functions for loading and processing the dataset, can't they be combined into one?

Keeping it separate makes the code more readable and easy to navigate the flow.

…ith condense if/else, renamed imports

…#14547) * Added RNN integration test for fit() API * Addressed review comments: change in JenkinFile, tmp directory, ctx with condense if/else, renamed imports * CPU test doesn't require nvidiadocker container * Modified the structure by removing the redundant code

* [MXNet-1334][Fit API]base class for estimator and eventhandler (#14346) * base class for estimator and eventhandler * add license * add event handlers * fix pylint * improve arg check * fix pylint * add unit tests * Fixed issue where the estimator was printing beyond the dataset size … (#14464) * Fixed issue where the estimator was printing beyond the dataset size for the last batch * Added comments * Nudge to CI * [MXNet-1349][Fit API]Add validation support and unit tests for fit() API (#14442) * added estimator unittests * add more tests for estimator * added validation logic * added error handlers, unittests * improve val stats * fix pylint * fix pylint * update unit test * fix tests * fix tests * updated metrics, val logic * trigger ci * trigger ci * update metric, batch_fn error handler * update context logic, add default metric * [MXNet-1340][Fit API]Update train stats (#14494) * add train history * update history * update test * avoid calling empty methods * remove train history object * fix pylint * add unit test * fix test * update categorize handlers * [MXNet-1375][Fit API]Added RNN integration test for fit() API (#14547) * Added RNN integration test for fit() API * Addressed review comments: change in JenkinFile, tmp directory, ctx with condense if/else, renamed imports * CPU test doesn't require nvidiadocker container * Modified the structure by removing the redundant code * [MXNet-1343][Fit API]Add CNN integration test for fit() API (#14405) * added cnn intg tests for fit api * updated cnn intg tests * added functions for nightly test * updated runtime_function * updated intg tests * updated init, datapath, refs * added validation data * update cpu test * refactor code * updated context * [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API (#14587) * Retrieve Batch size and Logging verbose support for Gluon fit() API * NIT changes * Addressed review comments: shifted the batch size code to a separate method, sentence correction * Modified unittest * removed redundant parameter * Resolve CI test failure * only support DataLoader for now, future PRs will include DataIter to DataLoader converter * Get the number of samples from shape attribute instead of length due to low space complexity * Simplified batch size retrieval code * removed batch_size parameter from fit() method and fixed the tests * Verbose exception handling * Assigning constant to a verbose * Modified exception message * Resolved undefined class reference * Addressed review comments: Modified verbose level names, docs, variable names * Update estimator.py * move estimator to contrib (#14633) * move to gluon contrib (#14635) * [Fit API] improve event handlers (#14685) * improve event handlers * update tests * passing weakref of estimator * fix unit test * fix test * fix pylint * fix test * fix pylint * move default metric logic * combine nightly tests * [MXNET-1396][Fit-API] Update default handler logic (#14765) * move to nightly for binaries * update default handler * fix pylint * trigger ci * trigger ci * [Fit API] update estimator (#14849) * address comments * add comment * check available context * fix bug * change cpu check * [Fit-API] Adress PR comments (#14885) * address comments * update checkpoint * test symbol save * address comments * add resume * update doc and resume checkpoint * update docs * trigger ci * trigger ci

* Added RNN integration test for fit() API * Addressed review comments: change in JenkinFile, tmp directory, ctx with condense if/else, renamed imports * CPU test doesn't require nvidiadocker container * Modified the structure by removing the redundant code

* [MXNet-1334][Fit API]base class for estimator and eventhandler (apache#14346) * base class for estimator and eventhandler * add license * add event handlers * fix pylint * improve arg check * fix pylint * add unit tests * Fixed issue where the estimator was printing beyond the dataset size … (apache#14464) * Fixed issue where the estimator was printing beyond the dataset size for the last batch * Added comments * Nudge to CI * [MXNet-1349][Fit API]Add validation support and unit tests for fit() API (apache#14442) * added estimator unittests * add more tests for estimator * added validation logic * added error handlers, unittests * improve val stats * fix pylint * fix pylint * update unit test * fix tests * fix tests * updated metrics, val logic * trigger ci * trigger ci * update metric, batch_fn error handler * update context logic, add default metric * [MXNet-1340][Fit API]Update train stats (apache#14494) * add train history * update history * update test * avoid calling empty methods * remove train history object * fix pylint * add unit test * fix test * update categorize handlers * [MXNet-1375][Fit API]Added RNN integration test for fit() API (apache#14547) * Added RNN integration test for fit() API * Addressed review comments: change in JenkinFile, tmp directory, ctx with condense if/else, renamed imports * CPU test doesn't require nvidiadocker container * Modified the structure by removing the redundant code * [MXNet-1343][Fit API]Add CNN integration test for fit() API (apache#14405) * added cnn intg tests for fit api * updated cnn intg tests * added functions for nightly test * updated runtime_function * updated intg tests * updated init, datapath, refs * added validation data * update cpu test * refactor code * updated context * [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API (apache#14587) * Retrieve Batch size and Logging verbose support for Gluon fit() API * NIT changes * Addressed review comments: shifted the batch size code to a separate method, sentence correction * Modified unittest * removed redundant parameter * Resolve CI test failure * only support DataLoader for now, future PRs will include DataIter to DataLoader converter * Get the number of samples from shape attribute instead of length due to low space complexity * Simplified batch size retrieval code * removed batch_size parameter from fit() method and fixed the tests * Verbose exception handling * Assigning constant to a verbose * Modified exception message * Resolved undefined class reference * Addressed review comments: Modified verbose level names, docs, variable names * Update estimator.py * move estimator to contrib (apache#14633) * move to gluon contrib (apache#14635) * [Fit API] improve event handlers (apache#14685) * improve event handlers * update tests * passing weakref of estimator * fix unit test * fix test * fix pylint * fix test * fix pylint * move default metric logic * combine nightly tests * [MXNET-1396][Fit-API] Update default handler logic (apache#14765) * move to nightly for binaries * update default handler * fix pylint * trigger ci * trigger ci * [Fit API] update estimator (apache#14849) * address comments * add comment * check available context * fix bug * change cpu check * [Fit-API] Adress PR comments (apache#14885) * address comments * update checkpoint * test symbol save * address comments * add resume * update doc and resume checkpoint * update docs * trigger ci * trigger ci

…#14547) * Added RNN integration test for fit() API * Addressed review comments: change in JenkinFile, tmp directory, ctx with condense if/else, renamed imports * CPU test doesn't require nvidiadocker container * Modified the structure by removing the redundant code

Added RNN integration test for fit() API

a25c6b2

piyushghai reviewed Mar 27, 2019

View reviewed changes

marcoabreu added Gluon Test labels Mar 27, 2019

roywei reviewed Mar 28, 2019

View reviewed changes

abhinavs95 reviewed Mar 28, 2019

View reviewed changes

Addressed review comments: change in JenkinFile, tmp directory, ctx w…

ae81b08

…ith condense if/else, renamed imports

karan6181 requested a review from marcoabreu as a code owner April 1, 2019 23:12

karan6181 added 2 commits April 1, 2019 16:20

CPU test doesn't require nvidiadocker container

a7dfeb3

Modified the structure by removing the redundant code

60ec9b8

nswamy approved these changes Apr 3, 2019

View reviewed changes

nswamy merged commit 81ec379 into apache:fit-api Apr 3, 2019

nswamy mentioned this pull request Apr 5, 2019

[MXNET-1335][fit api]Text Sentiment Classification examples using Gluon fit() API #14350

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNet-1375][Fit API]Added RNN integration test for fit() API #14547

[MXNet-1375][Fit API]Added RNN integration test for fit() API #14547

karan6181 commented Mar 27, 2019

piyushghai Mar 27, 2019

karan6181 Apr 1, 2019 •

edited

Loading

piyushghai Mar 27, 2019

abhinavs95 Mar 28, 2019 •

edited

Loading

karan6181 Apr 2, 2019

piyushghai Mar 27, 2019

karan6181 Apr 1, 2019

piyushghai Mar 27, 2019

karan6181 Apr 1, 2019

piyushghai Mar 27, 2019

karan6181 Apr 2, 2019

abhinavs95 commented Mar 27, 2019

roywei left a comment •

edited

Loading

roywei Mar 28, 2019

karan6181 Apr 1, 2019

roywei Mar 28, 2019

karan6181 Apr 1, 2019

roywei Mar 28, 2019

abhinavs95 Mar 28, 2019 •

edited

Loading

abhinavs95 Mar 28, 2019

karan6181 Apr 2, 2019

abhinavs95 Mar 28, 2019

karan6181 Apr 2, 2019

		assert e.train_stats['train_' + acc.name][num_epochs - 1] > 0.70


		if __name__ == '__main__':

[MXNet-1375][Fit API]Added RNN integration test for fit() API #14547

[MXNet-1375][Fit API]Added RNN integration test for fit() API #14547

Conversation

karan6181 commented Mar 27, 2019

Description

Checklist

Essentials

Changes

Comments

Choose a reason for hiding this comment

karan6181 Apr 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavs95 Mar 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavs95 commented Mar 27, 2019

roywei left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavs95 Mar 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karan6181 Apr 1, 2019 •

edited

Loading

abhinavs95 Mar 28, 2019 •

edited

Loading

roywei left a comment •

edited

Loading

abhinavs95 Mar 28, 2019 •

edited

Loading