cconvey/merge some upstream #367

cconvey · 2018-08-17T15:27:10Z

Merges upstream master up through the commit at which their subgraph branch was created.

This is to reduce code-version disparities as we develop subgraph-related changes to the bridge code.

* [MXAPPS-581] Nightly Straight Dope tests. The Straight Dope notebooks will retrieved from the Github repo, run and scanned for warnings and errors. Because we are not checking accuracy of the training, we set the number of epochs to 1 to reduce the integration test run time. * Common functionality for running and testing notebooks has been factored into a common test util module. * Support for running UTF-8 notebooks added (Python2 and 3 compatible). * Notebooks requiring a single GPU and multi GPUs have been split into two different test suites so that they can be run on different hardware. * Add test to make sure that all notebooks are tested. * Comment out broken notebooks while they are being fixed (I will uncomment them in a follow up PR). * [MXAPPS-581] Download notebooks in test setup. * Moving logic to download the Straight Dope notebooks to the test harness. * Remove cache logic as it is unnecessary. * [MXAPPS-581] Add a timeout for download of notebooks. * [MXAPPS-581] Move notebooks requiring multi-gpus. Move two notebooks requiring multi-GPUs out of the single GPU test suite.

…pdated) (#11591) * add multiroot all-reduce communication pattern * fix bug with UpdateWeight * fix PCI-E links appearing in weight matrix bug * optimization to skip CopyFromTo in ReduceInner gains a bit of throughput * remove unnecessary if statement * Add tests * add more tests, 6 tests left to add * get rid of some dead code * Add comments * Add randomized tests for backtrack and kernighan-lin * Fix Postprocess * Add switch for first valid tree when num_gpus > 8, and for maximum weight when num_gpus <= 8 * Kernighan-Lin seems to find better trees * get rid of printfs * change defaults * inherit from CommDevice instead of Comm * Fix lint errors * Add Python test using MXNET_KVSTORE_USETREE, fix CMake compilation problem, add header guard * fix lint errors * better header guard that works for tests * get rid of unused variable warning * retrigger jenkins * resolve 2 comments * address comment using Class to do test, get rid of extraneous test, use PCI-E as fallback for GPUs that are not linked by NVLink * address comments * fix a few bugs * get rid of printfs * get rid of print * Comment out test for now * fix 2 more bugs * fix segfault * change PrintVector, PrintTopo, PrintMatrix to LOG(INFO) instead of stdout * Fix code alignment * get rid of todo * Make changes to env variable names to indicate they are TREE-related * Add note saying when ARRAY_BOUND env var takes effect

* Fix file name creation for Windows * Forcing build * Force build again

* update vgg pretrained model * Trigger CI * Trigger CI

* Add verify_ssl option to gluon.utils.download Sometimes datasets may be hosted on servers that serve invalid SSL certificates. * Add warning * Add test * Mock gluon.utils.download tests * Add Py2 mock dependency to Jenkinsfile

…e Release & Maven Central Repo (#11862) * pom file changes for maven builds

This enabled retries for Docker build commands executed by our master and PR builds.

* Return if iteration counter `N` is less than or equal to zero. * Fix spelling.

…void nan (#11795)

* refactor R optimizers to fix memory leak * add Adadelta and Adagrad * fix comments * fix comments * fix comments * add tests * fix whitespaces * fix whitespaces * fix typo * fix typo * add doc on clipping

* Add logistic regression tutorial * Code review fix * Add F1 metric, fix code review comments * Add Download buttons script

* fix undeterminism of dot(csr.T, dns) = dns with tests * address code reviews

…) (#11587) * [MXNET-378] Adding depth_to_space and space_to_depth operator * fixed lint and windows CPU errors * compliance with C++ style guiide and address shortcomings in unittests * fixed documentation and nitpicky suggestions * added operator references in API docs and removed inplace optimization support * Added references in symbol.md and ndarray.md. Improved test cases and added block_size check * Fixing bugs in documentation. Tests now include tensors of random shapes.

* fix ctc_loss GPU bug * add blank_label parameter for CTCLoss * Revert "add blank_label parameter for CTCLoss" This reverts commit aab11f7575580f88f5f27be14466d0deb4b4c456.

* add more ops * use dict.get * add list comprehensive * retrigger CI due to unrelated flaky test failure

* Replace cublassgemm with cublassgemmex for >= 7.5 * Add comment for cublassgemmex

* Remove fixed seed for test_sparse_nd_save_load * Add comments related to the commit

Corrected a race condition with stopping profiling. Added mx.nd.waitall to ensure all operations have completed, including GPU operations that might otherwise be missing. Also added alternative code for context selection GPU vs CPU, that had error before on machines with nvidia-smi.

* fix bugs and improve tutorial * improve logging * update benchmark_score * Update float16.md * update link to dmlc web data * fix train cifar and add random mirroring * set aug defaults * fix whitespace * fix typo

* adding param for list of tags to display on website * using new website display argument for artifact placement in version folder * adding display logic * remove restricted setting for testing * update usage instructions * reverted Jenkinsfile to use restricted nodes

* Update relative paths pointing to the data directory to point to the correct place in the testing temporary folder. * Enable the notebooks that were previously broken because of relative file paths not pointing to the correct place. * Move some notebooks we do not plan to test to the whitelist. These notebooks are not published in the Straight Dope book. * Clean-up: Convert print statements to info/warn/error logging statements. Add some logging statements for better status.

apache/mxnet#11839

* add linux and macos doc * update doc * Update MKL_README.md * Update MKL_README.md Add convolution code to verify mkldnn backend * add homebrew link * rename to MKLDNN_README * add mkl verify * trigger * trigger * set mac complier to gcc47 * add VS2017 support experimentally * improve quality * improve quality * modify mac build instruction since prepare_mkldnn.sh has been rm * trigger * add some improvement

* add changes to example * place the file to the util * add retry scheme * fix the retry logic * change the DownloadUtil to Util * Trigger the CI

…req='add' (#11338) * Add tests that fail due to issue 11241 * Fix #11241 Conv1D throws CUDNN_STATUS_EXECUTION_FAILED * Force algo 1 when grad_req==add with large c. Expand tests. * Shorten test runtimes.

…ning with Gluon (#11910) * Add description about update on kvstore * add async check for gluon * only raise error if user set update_on_kvstore * fix condition * add async nightly test * fix case when no kvstore * add example for trainer creation in doc

* fix install instructions for MXNET-R * fix install instructions for MXNET-R * fix default cuda version for MXNet-R

* add xavier initializer * remove comment line

….data_dir() (#11636) * set MXNET_DATA_DIR as base for downloaded models through base.data_dir() push joblib to save containers so is not required when running * MXNET_DATA_DIR -> MXNET_HOME

* put force load back as a temporary solution * use project.basedir as relative path for OSX linker

* use assert_almost_equal, increase rtol, reduce matrix size * remove seed in test_bind * add seed 0 to test_bind, it is still flaky * add comments for tracking

… (#11808) * remove mod from arity 2 version of load-checkpoint * load-checkpoint arity 2 test

* fix broken link * fix broken link * switch to .md links * fix broken link

* Added tolerance level for assert_almost_equal for MBCC * Nudge to CI

* Windows scripted build Adjust Jenkins builds to use ci/build_windows.py Issues: #8714 #11100 #10166 #10049 * Fix bug * Fix non-portable ut * add xunit

array and multiply are undefined. Importing them from ndarray

* Remove fixed seed in flaky test * Remove fixed seed in flaky test * Update random seed to reproduce the issue * Fix Flaky unit test and add a training test * Remove fixed seed in flaky test * Update random seed to reproduce the issue * Fix Flaky unit test and add a training test * Increase accuracy check

cconvey · 2018-08-17T19:53:33Z

I've performed the following manual checks on the PR's code changes:

I reviewed the diff of origin/master vs. this PR's code.
I review the diff of Incubator's 8d4d5f commit vs. this PR's code.

I didn't spot any problematic code changes from either perspective.

This reverts commit cfbcdab.

julia> copy(1:4, mx.cpu()) 4 mx.NDArray{Int64,1} @ CPU0: 1 2 3 4 julia> copy(1.:4, mx.cpu()) 4 mx.NDArray{Float64,1} @ CPU0: 1.0 2.0 3.0 4.0

Junru Shao and others added 30 commits July 24, 2018 00:39

Enable control flow test (#11869)

8f4b092

Fix file name creation for Windows (#11765)

fa935a8

* Fix file name creation for Windows * Forcing build * Force build again

update vgg pretrained model (#11860)

8a21a06

* update vgg pretrained model * Trigger CI * Trigger CI

Add verify_ssl option to gluon.utils.download (#11546)

07a9977

* Add verify_ssl option to gluon.utils.download Sometimes datasets may be hosted on servers that serve invalid SSL certificates. * Add warning * Add test * Mock gluon.utils.download tests * Add Py2 mock dependency to Jenkinsfile

[MXNET-710] Change POM files to be able to regularly publish to Apach…

424fafe

…e Release & Maven Central Repo (#11862) * pom file changes for maven builds

Enable three retries for Docker build commands (#11877)

06f4ec7

This enabled retries for Docker build commands executed by our master and PR builds.

Avoid Division by Zero (#11397)

0b8b939

* Return if iteration counter `N` is less than or equal to zero. * Fix spelling.

making AddTakeGrad as default for backward of embedding and take to a…

fe1c7ab

…void nan (#11795)

[MXNET-563] Refactor R optimizers to fix memory leak (#11374)

be47870

* refactor R optimizers to fix memory leak * add Adadelta and Adagrad * fix comments * fix comments * fix comments * add tests * fix whitespaces * fix whitespaces * fix typo * fix typo * add doc on clipping

Add logistic regression tutorial (#11651)

832a5fb

* Add logistic regression tutorial * Code review fix * Add F1 metric, fix code review comments * Add Download buttons script

Re-enabling randomized test_operator/test_operator_gpu.test_dot (#11888)

7cd01ff

Fix non-determinism of dot(csr.T, dns) = dns with tests (#11825)

302aae3

* fix undeterminism of dot(csr.T, dns) = dns with tests * address code reviews

Support integer type in ImageIter (#11864)

f5b95b0

Fix mxnet ctc_loss bug (#11834)

2bddf6f

* fix ctc_loss GPU bug * add blank_label parameter for CTCLoss * Revert "add blank_label parameter for CTCLoss" This reverts commit aab11f7575580f88f5f27be14466d0deb4b4c456.

[MXNET-344] Add more operators to onnx import (#11856)

4bbf15c

* add more ops * use dict.get * add list comprehensive * retrigger CI due to unrelated flaky test failure

make skiptest work (#11889)

a8c8737

Fix flaky test test_deconvolution (#11630)

bd3fc88

* Replace cublassgemm with cublassgemmex for >= 7.5 * Add comment for cublassgemmex

Remove fixed seed for test_sparse_nd_save_load (#11920)

011a0dc

* Remove fixed seed for test_sparse_nd_save_load * Add comments related to the commit

Fix image classification scripts and Improve Fp16 tutorial (#11533)

54ebc5d

* fix bugs and improve tutorial * improve logging * update benchmark_score * Update float16.md * update link to dmlc web data * fix train cifar and add random mirroring * set aug defaults * fix whitespace * fix typo

Disable flaky test: test_spatial_transformer_with_type (#11930)

83ae3a3

apache/mxnet#11839

[MXNET-531] Add download util (#11866)

b2fd3b1

* add changes to example * place the file to the util * add retry scheme * fix the retry logic * change the DownloadUtil to Util * Trigger the CI

[MXNET-11241] Avoid use of troublesome cudnnFind() results when grad_…

024b5a9

…req='add' (#11338) * Add tests that fail due to issue 11241 * Fix #11241 Conv1D throws CUDNN_STATUS_EXECUTION_FAILED * Force algo 1 when grad_req==add with large c. Expand tests. * Shorten test runtimes.

ankkhedia and others added 21 commits August 1, 2018 16:34

Fix install instructions for MXNET-R (#11976)

31c5fbc

* fix install instructions for MXNET-R * fix install instructions for MXNET-R * fix default cuda version for MXNet-R

[MXNET-751] fix ce_loss flaky (#11971)

a93905d

* add xavier initializer * remove comment line

[MXNET-769] set MXNET_HOME as base for downloaded models through base…

564e01a

….data_dir() (#11636) * set MXNET_DATA_DIR as base for downloaded models through base.data_dir() push joblib to save containers so is not required when running * MXNET_DATA_DIR -> MXNET_HOME

[MXNET-748] linker fixed on Scala issues (#11989)

6009b26

* put force load back as a temporary solution * use project.basedir as relative path for OSX linker

[MXNET-772] Re-enable test_module.py:test_module_set_params (#11979)

946e9d0

[MXNET-771] Fix Flaky Test test_executor.py:test_dot (#11978)

1bd9356

* use assert_almost_equal, increase rtol, reduce matrix size * remove seed in test_bind * add seed 0 to test_bind, it is still flaky * add comments for tracking

remove mod from arity 2 version of load-checkpoint in clojure-package…

833de7e

… (#11808) * remove mod from arity 2 version of load-checkpoint * load-checkpoint arity 2 test

Add unit test stage for mxnet cpu in debug mode (#11974)

bcfab3a

Website broken link fixes (#12014)

c937277

* fix broken link * fix broken link * switch to .md links * fix broken link

removed seed from flaky test (#11975)

1818280

Disable ccache log print due to threadunsafety (#11997)

619700a

Added default tolerance levels for regression checks for MBCC (#12006)

2534164

* Added tolerance level for assert_almost_equal for MBCC * Nudge to CI

Disable flaky mkldnn test_requantize_int32_to_int8 (#11748)

32c2e15

[MXNET-769] Usability improvements to windows builds (#11947)

1fa04f2

* Windows scripted build Adjust Jenkins builds to use ci/build_windows.py Issues: #8714 #11100 #10166 #10049 * Fix bug * Fix non-portable ut * add xunit

Fix import statement (#12005)

5628194

array and multiply are undefined. Importing them from ndarray

Disable flaky test test_random.test_gamma_generator (#12022)

3dd0003

disable opencv threading for forked process (#12025)

ae698f9

Bug fixes in control flow operators (#11942)

22c97ef

Fix data narrowing warning on graph_executor.cc (#11969)

8d4d5fa

Merge incubator commit 8d4d5f (branch-point for incubator/subgraph)

f13b1cd

cconvey requested a review from mbrookhart August 17, 2018 15:30

mbrookhart approved these changes Aug 17, 2018

View reviewed changes

mbrookhart merged commit cfbcdab into master Aug 17, 2018

mbrookhart pushed a commit that referenced this pull request Aug 17, 2018

Revert "cconvey/merge some upstream (#367)"

4f89e31

This reverts commit cfbcdab.

mbrookhart mentioned this pull request Aug 17, 2018

Revert "cconvey/merge some upstream" #368

Merged

mbrookhart pushed a commit that referenced this pull request Aug 17, 2018

Revert "cconvey/merge some upstream (#367)" (#368)

3ed4635

This reverts commit cfbcdab.

mbrookhart deleted the cconvey/merge-some-upstream branch August 17, 2018 23:37

ashokei pushed a commit that referenced this pull request Oct 24, 2018

ndarray: copy(AbstractArray, context) (#367)

12198f0

julia> copy(1:4, mx.cpu()) 4 mx.NDArray{Int64,1} @ CPU0: 1 2 3 4 julia> copy(1.:4, mx.cpu()) 4 mx.NDArray{Float64,1} @ CPU0: 1.0 2.0 3.0 4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cconvey/merge some upstream #367

cconvey/merge some upstream #367

cconvey commented Aug 17, 2018

cconvey commented Aug 17, 2018

cconvey/merge some upstream #367

cconvey/merge some upstream #367

Conversation

cconvey commented Aug 17, 2018

cconvey commented Aug 17, 2018