Merge stable 1.3.1 release of MXNet #35

jens-mueller-sociomantic · 2019-04-09T07:44:13Z

https://github.com/apache/incubator-mxnet/releases/tag/1.3.1

This updates to MXNet 1.3.1. We are skipping 1.2.x because they
indirectly reference missing commits. The update to 1.3.0 is skipped
because there is already the patched version 1.3.1 available.

The conflicts in .gitmodules are manually resolved to keep our
beaver and makd dependencies. For all other conflicts their version was
picked all the time.

To see our changes with respect to upstream MXNet diff against the MXNet
1.3.1 tag, i.e., git diff 1.3.1..HEAD.

By default MXNet 1.3.1 integrates Intel MKL-DNN. For now we disable it because
we currently don't need it.

…he#11888)

* fix undeterminism of dot(csr.T, dns) = dns with tests * address code reviews

apache#11587) * [MXNET-378] Adding depth_to_space and space_to_depth operator * fixed lint and windows CPU errors * compliance with C++ style guiide and address shortcomings in unittests * fixed documentation and nitpicky suggestions * added operator references in API docs and removed inplace optimization support * Added references in symbol.md and ndarray.md. Improved test cases and added block_size check * Fixing bugs in documentation. Tests now include tensors of random shapes.

* fix ctc_loss GPU bug * add blank_label parameter for CTCLoss * Revert "add blank_label parameter for CTCLoss" This reverts commit aab11f7.

* add more ops * use dict.get * add list comprehensive * retrigger CI due to unrelated flaky test failure

* Replace cublassgemm with cublassgemmex for >= 7.5 * Add comment for cublassgemmex

* Remove fixed seed for test_sparse_nd_save_load * Add comments related to the commit

Corrected a race condition with stopping profiling. Added mx.nd.waitall to ensure all operations have completed, including GPU operations that might otherwise be missing. Also added alternative code for context selection GPU vs CPU, that had error before on machines with nvidia-smi.

) * fix bugs and improve tutorial * improve logging * update benchmark_score * Update float16.md * update link to dmlc web data * fix train cifar and add random mirroring * set aug defaults * fix whitespace * fix typo

* adding param for list of tags to display on website * using new website display argument for artifact placement in version folder * adding display logic * remove restricted setting for testing * update usage instructions * reverted Jenkinsfile to use restricted nodes

* Update relative paths pointing to the data directory to point to the correct place in the testing temporary folder. * Enable the notebooks that were previously broken because of relative file paths not pointing to the correct place. * Move some notebooks we do not plan to test to the whitelist. These notebooks are not published in the Straight Dope book. * Clean-up: Convert print statements to info/warn/error logging statements. Add some logging statements for better status.

apache#11839

* add linux and macos doc * update doc * Update MKL_README.md * Update MKL_README.md Add convolution code to verify mkldnn backend * add homebrew link * rename to MKLDNN_README * add mkl verify * trigger * trigger * set mac complier to gcc47 * add VS2017 support experimentally * improve quality * improve quality * modify mac build instruction since prepare_mkldnn.sh has been rm * trigger * add some improvement

* add changes to example * place the file to the util * add retry scheme * fix the retry logic * change the DownloadUtil to Util * Trigger the CI

…req='add' (apache#11338) * Add tests that fail due to issue 11241 * Fix apache#11241 Conv1D throws CUDNN_STATUS_EXECUTION_FAILED * Force algo 1 when grad_req==add with large c. Expand tests. * Shorten test runtimes.

…ning with Gluon (apache#11910) * Add description about update on kvstore * add async check for gluon * only raise error if user set update_on_kvstore * fix condition * add async nightly test * fix case when no kvstore * add example for trainer creation in doc

* fix R windows install docs * addressed PR comments * PR comments * PR comments * fixed line wrappings * fixed line wrappings

* Added MNIST-MLP-Module-API models to check model save and load_checkpoint methods * Added LENET with Conv2D operator training file * Added LENET with Conv2d operator inference file * Added LanguageModelling with RNN training file * Added LamguageModelling with RNN inference file * Added hybridized LENET Gluon Model training file * Added hybridized LENET gluon model inference file * Added license headers * Refactored the model and inference files and extracted out duplicate code in a common file * Added runtime function for executing the MBCC files * Added JenkinsFile for MBCC to be run as a nightly job * Added boto3 install for s3 uploads * Added README for MBCC * Added license header * Added more common functions from lm_rnn_gluon_train and inference files into common.py to clean up code * Added scripts for training models on older versions of MXNet * Added check for preventing inference script from crashing in case no trained models are found * Fixed indentation issue * Replaced Penn Tree Bank Dataset with Sherlock Holmes Dataset * Fixed indentation issue * Removed training in models and added smaller models. Now we are simply checking a forward pass in the model with dummy data. * Updated README * Fixed indentation error * Fixed indentation error * Removed code duplication in the training file * Added comments for runtime_functions script for training files * Merged S3 Buckets for storing data and models into one * Automated the process to fetch MXNet versions from git tags * Added defensive checks for the case where the data might not be found * Fixed issue where we were performing inference on state model files * Replaced print statements with logging ones * Removed boto install statements and move them into ubuntu_python docker * Separated training and uploading of models into separate files so that training runs in Docker and upload runs outside Docker * Fixed pylint warnings * Updated comments and README * Removed the venv for training process * Fixed indentation in the MBCC Jenkins file and also separated out training and inference into two separate stages * Fixed indendation * Fixed erroneous single quote * Added --user flag to check for Jenkins error * Removed unused methods * Added force flag in the pip command to install mxnet * Removed the force-re-install flag * Changed exit 1 to exit 0 * Added quotes around the shell command * added packlibs and unpack libs for MXNet builds * Changed PythonPath from relative to absolute * Created dedicated bucket with correct permission * Fix for python path in training * Changed bucket name to CI bucket * Added set -ex to the upload shell script * Now raising an exception if no models are found in the S3 bucket * Added regex to train models script * Added check for performing inference only on models trained on same major versions * Added set -ex flags to shell scripts * Added multi-version regex checks in training * Fixed typo in regex * Now we will train models for all the minor versions for a given major version by traversing the tags * Added check for validating current_version

* add initial neuralstyle and test coverage * Add two more test and README * kill comments * patch on memory leaks fix * fix formatting issues * remove redundant files * disable the Gan example for now * add ignore method * add new download scheme to match the changes

* fix nested call on cachedop. * fix.

)

* Remove fixed seed in flaky test * Remove fixed seed in flaky test

* Reduced test to 3 epochs and made GPU only * Moved logger variable so that it's accessible

* allow foreach on input with 0 length * add test foreach with unknown dim

…3158) * fixed symbols naming in RNNCell and LSTMCell * fixed GRUCell as well * added test * fixed tests?

…e#13157) * fixed indentation * simplified code

…v1.3.x) (apache#13152) * add env variable to choose deterministic cudnn alg * set default value to false * fix build failure in Windows GPU * revert the previous change * only check determinism in CUDNN 7.x release * Add cudnn version check * fix lint error

…v1.3.x) (apache#13121)" (apache#13228) This reverts commit d0b83d4.

* news, readme update for v1.3.1 release * Added release notes

jens-mueller-sociomantic · 2019-04-10T07:58:45Z

We probably want to remove .codecov.yml from our repository.

jens-mueller-sociomantic · 2019-04-10T08:20:28Z

The diff against 1.3.1 mentions changes for tests/python/gpu/test_kvstore_gpu.py which should not be there. I try to figure out how they ended up there.

https://github.com/apache/incubator-mxnet/releases/tag/1.3.1 This updates to MXNet 1.3.1. We are skipping 1.2.x because they indirectly reference missing commits. The update to 1.3.0 is skipped because there is already the patched version 1.3.1 available. The conflicts in `.gitmodules` are manually resolved to keep our beaver and makd dependencies. For all other conflicts their version was picked all the time. To see our changes with respect to upstream MXNet diff against the MXNet 1.3.1 tag, i.e., `git diff 1.3.1..HEAD`. * 3rdparty/dmlc-core e9446f5(e9446f5)...0a0e8ad(0a0e8ad) (41 commits) > Add OMPException class and use it for Text Parser (apache#445) > Fix build problem on windows (apache#450) > switch to safe_load for kubernetes config load (apache#449) > Add S3_IS_AWS env and fixed non-AWS behavior (apache#444) > add error message for s3 list (apache#439) (...) * 3rdparty/mkldnn 0e7ca738(0e7ca738)...0e7ca738(0e7ca738) (99 commits) > build: bumped version to v0.14 in readme > build: bumped version to v0.14 > cpu: reorder: start using jit uni for 8x8 transposition > cpu: reorder: jit uni: add 8x8 kernel > cpu: reorder: enable jit uni reorder (...) * 3rdparty/mshadow a8c650c(a8c650c)...8a9e337(8a9e337) (9 commits) > Merge pull request apache#358 from eric-haibin-lin/revert > Merge pull request apache#357 from azai91/revert/d68d3 > Merge pull request apache#356 from szha/omp > Add half_t support for batch_dot. (apache#353) > Allow large array operation in MXNet (apache#348) (...) * 3rdparty/onnx-tensorrt ()...3d8ee04(3d8ee04) (1 commits) > Refactor onnxGetBackendInfo (apache#39) * 3rdparty/ps-lite v1+144(a6dda54)...v1+146(8a76389) (1 commits) > Merge pull request apache#133 from CodingCat/turn_up_down * 3rdparty/tvm v0.3+434(90db723)d...v0.3+434(90db723)d (1 commits) > [FRONTEND] A Python hybrid frontend (apache#1251)

jens-mueller-sociomantic · 2019-04-12T06:43:41Z

Removed the changes for tests/python/gpu/test_kvstore_gpu.py because the merge commit shouldn't have touched this file.

stefan-koch-sociomantic · 2019-04-12T12:12:05Z

I cannot meaningfully review this.
If it does not break anything on your side, it's good to go.

stefan-koch-sociomantic

If it does not break anything on your side I am fine with those changes.

jens-mueller-sociomantic · 2019-04-17T08:18:28Z

Tested it and looks good.

The new MXNet release enables Intel MKL-DNN by default. Because we don't use this functionality we disable it for now. But this decision should be revisited, especially considering that we may want to be close to MXNet's default build configuration.

haojin2 and others added 30 commits July 26, 2018 01:45

Re-enabling randomized test_operator/test_operator_gpu.test_dot (apac…

7cd01ff

…he#11888)

Fix non-determinism of dot(csr.T, dns) = dns with tests (apache#11825)

302aae3

* fix undeterminism of dot(csr.T, dns) = dns with tests * address code reviews

Support integer type in ImageIter (apache#11864)

f5b95b0

Fix mxnet ctc_loss bug (apache#11834)

2bddf6f

* fix ctc_loss GPU bug * add blank_label parameter for CTCLoss * Revert "add blank_label parameter for CTCLoss" This reverts commit aab11f7.

[MXNET-344] Add more operators to onnx import (apache#11856)

4bbf15c

* add more ops * use dict.get * add list comprehensive * retrigger CI due to unrelated flaky test failure

make skiptest work (apache#11889)

a8c8737

Fix flaky test test_deconvolution (apache#11630)

bd3fc88

* Replace cublassgemm with cublassgemmex for >= 7.5 * Add comment for cublassgemmex

Remove fixed seed for test_sparse_nd_save_load (apache#11920)

011a0dc

* Remove fixed seed for test_sparse_nd_save_load * Add comments related to the commit

Disable flaky test: test_spatial_transformer_with_type (apache#11930)

83ae3a3

apache#11839

[MXNET-531] Add download util (apache#11866)

b2fd3b1

* add changes to example * place the file to the util * add retry scheme * fix the retry logic * change the DownloadUtil to Util * Trigger the CI

[MXNET-641] fix R windows install docs (apache#11805)

815f42d

* fix R windows install docs * addressed PR comments * PR comments * PR comments * fixed line wrappings * fixed line wrappings

a hot fix for mkldnn link (apache#11939)

461ba07

re-enabling randomized test_l2_normalization (apache#11900)

7ffb252

[MXNET-750] fix nested call on CachedOp. (apache#11951)

ed20304

* fix nested call on cachedop. * fix.

extend reshape op to allow reverse shape inference (apache#11956)

51f650e

Improve sparse embedding index out of bound error message; (apache#11940

1eef070

)

[MXNET-770] Remove fixed seed in flaky test (apache#11958)

fc912f3

* Remove fixed seed in flaky test * Remove fixed seed in flaky test

Update ONNX docs with the latest supported ONNX version (apache#11936)

394e5cc

Reduced test to 3 epochs and made gpu only (apache#11863)

eed7a34

* Reduced test to 3 epochs and made GPU only * Moved logger variable so that it's accessible

Fix flaky tests for test_laop_4 (apache#11972)

c6a32b6

lebeg and others added 12 commits November 7, 2018 10:20

add/update infer_range docs (apache#13153)

acaf5df

fix broken docs (apache#13154)

c0f3d02

allow foreach on input with 0 length (v1.3.x) (apache#13151)

23c09c7

* allow foreach on input with 0 length * add test foreach with unknown dim

fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x) (apache#1…

dff0431

…3158) * fixed symbols naming in RNNCell and LSTMCell * fixed GRUCell as well * added test * fixed tests?

Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x) (apach…

9f8aa60

…e#13157) * fixed indentation * simplified code

document env variable MXNET_ENFORCE_DETERMINISM (apache#13156)

be8fdc8

Remove test for non existing index copy operator (apache#13180)

27dc5c8

Disable flaky test test_operator.test_dropout (apache#13200)

7fc344c

Revert "Set correct update on kvstore flag in dist_device_sync mode (…

0cb2ad6

…v1.3.x) (apache#13121)" (apache#13228) This reverts commit d0b83d4.

news, readme update for v1.3.1 release (apache#13225)

c1327f3

* news, readme update for v1.3.1 release * Added release notes

Add apt update to all docker install scripts (apache#13287)

19c5016

jens-mueller-sociomantic mentioned this pull request Apr 9, 2019

Merge stable 1.2.1 release of MXNet #34

Closed

jens-mueller-sociomantic force-pushed the merge-1.3.1 branch from e745179 to fea6f06 Compare April 10, 2019 07:50

jens-mueller-sociomantic force-pushed the merge-1.3.1 branch from fea6f06 to 950178a Compare April 12, 2019 06:41

jens-mueller-sociomantic requested a review from stefan-koch-sociomantic April 12, 2019 10:11

jens-mueller-sociomantic force-pushed the merge-1.3.1 branch from 950178a to f2bfbc5 Compare April 12, 2019 10:17

stefan-koch-sociomantic approved these changes Apr 12, 2019

View reviewed changes

Disable using of Intel MKL-DNN

fe8cc05

The new MXNet release enables Intel MKL-DNN by default. Because we don't use this functionality we disable it for now. But this decision should be revisited, especially considering that we may want to be close to MXNet's default build configuration.

jens-mueller-sociomantic force-pushed the merge-1.3.1 branch from f2bfbc5 to fe8cc05 Compare April 17, 2019 08:37

jens-mueller-sociomantic merged commit f3e89b1 into sociomantic-tsunami:v1.x.x Apr 17, 2019

jens-mueller-sociomantic deleted the merge-1.3.1 branch April 17, 2019 09:21

jens-mueller-sociomantic added this to the 1.3.1-0tsunami1 milestone Apr 17, 2019

jens-mueller-sociomantic added the type-enhancement label Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge stable 1.3.1 release of MXNet #35

Merge stable 1.3.1 release of MXNet #35

jens-mueller-sociomantic commented Apr 9, 2019 •

edited

Loading

jens-mueller-sociomantic commented Apr 10, 2019

jens-mueller-sociomantic commented Apr 10, 2019

jens-mueller-sociomantic commented Apr 12, 2019

stefan-koch-sociomantic commented Apr 12, 2019

stefan-koch-sociomantic left a comment

jens-mueller-sociomantic commented Apr 17, 2019

Merge stable 1.3.1 release of MXNet #35

Merge stable 1.3.1 release of MXNet #35

Conversation

jens-mueller-sociomantic commented Apr 9, 2019 • edited Loading

jens-mueller-sociomantic commented Apr 10, 2019

jens-mueller-sociomantic commented Apr 10, 2019

jens-mueller-sociomantic commented Apr 12, 2019

stefan-koch-sociomantic commented Apr 12, 2019

stefan-koch-sociomantic left a comment

Choose a reason for hiding this comment

jens-mueller-sociomantic commented Apr 17, 2019

jens-mueller-sociomantic commented Apr 9, 2019 •

edited

Loading