Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

cconvey/merge some upstream #367

Merged
merged 64 commits into from
Aug 17, 2018
Merged

cconvey/merge some upstream #367

merged 64 commits into from
Aug 17, 2018

Conversation

cconvey
Copy link
Contributor

@cconvey cconvey commented Aug 17, 2018

Merges upstream master up through the commit at which their subgraph branch was created.

  • This is to reduce code-version disparities as we develop subgraph-related changes to the bridge code.

Junru Shao and others added 30 commits July 24, 2018 00:39
* [MXAPPS-581] Nightly Straight Dope tests.

The Straight Dope notebooks will retrieved from the Github repo, run and
scanned for warnings and errors. Because we are not checking accuracy of
the training, we set the number of epochs to 1 to reduce the integration
test run time.
* Common functionality for running and testing notebooks has been
  factored into a common test util module.
* Support for running UTF-8 notebooks added (Python2 and 3 compatible).
* Notebooks requiring a single GPU and multi GPUs have been split
  into two different test suites so that they can be run on different
  hardware.
* Add test to make sure that all notebooks are tested.
* Comment out broken notebooks while they are being fixed (I will
  uncomment them in a follow up PR).

* [MXAPPS-581] Download notebooks in test setup.

* Moving logic to download the Straight Dope notebooks to the test
harness.
* Remove cache logic as it is unnecessary.

* [MXAPPS-581] Add a timeout for download of notebooks.

* [MXAPPS-581] Move notebooks requiring multi-gpus.

Move two notebooks requiring multi-GPUs out of the single GPU test suite.
…pdated) (#11591)

* add multiroot all-reduce communication pattern

* fix bug with UpdateWeight

* fix PCI-E links appearing in weight matrix bug

* optimization to skip CopyFromTo in ReduceInner gains a bit of throughput

* remove unnecessary if statement

* Add tests

* add more tests, 6 tests left to add

* get rid of some dead code

* Add comments

* Add randomized tests for backtrack and kernighan-lin

* Fix Postprocess

* Add switch for first valid tree when num_gpus > 8, and for maximum weight when num_gpus <= 8

* Kernighan-Lin seems to find better trees

* get rid of printfs

* change defaults

* inherit from CommDevice instead of Comm

* Fix lint errors

* Add Python test using MXNET_KVSTORE_USETREE, fix CMake compilation problem, add header guard

* fix lint errors

* better header guard that works for tests

* get rid of unused variable warning

* retrigger jenkins

* resolve 2 comments

* address comment using Class to do test, get rid of extraneous test, use PCI-E as fallback for GPUs that are not linked by NVLink

* address comments

* fix a few bugs

* get rid of printfs

* get rid of print

* Comment out test for now

* fix 2 more bugs

* fix segfault

* change PrintVector, PrintTopo, PrintMatrix to LOG(INFO) instead of stdout

* Fix code alignment

* get rid of todo

* Make changes to env variable names to indicate they are TREE-related

* Add note saying when ARRAY_BOUND env var takes effect
* Fix file name creation for Windows

* Forcing build

* Force build again
* update vgg pretrained model

* Trigger CI

* Trigger CI
* Add verify_ssl option to gluon.utils.download

Sometimes datasets may be hosted on servers that serve invalid SSL certificates.

* Add warning

* Add test

* Mock gluon.utils.download tests

* Add Py2 mock dependency to Jenkinsfile
…e Release & Maven Central Repo (#11862)

* pom file changes for maven builds
This enabled retries for Docker build commands executed by our master and PR builds.
* Return if iteration counter `N` is less than or equal to zero.

* Fix spelling.
* refactor R optimizers to fix memory leak

* add Adadelta and Adagrad

* fix comments

* fix comments

* fix comments

* add tests

* fix whitespaces

* fix whitespaces

* fix typo

* fix typo

* add doc on clipping
* Add logistic regression tutorial

* Code review fix

* Add F1 metric, fix code review comments

* Add Download buttons script
* fix undeterminism of dot(csr.T, dns) = dns with tests

* address code reviews
…) (#11587)

* [MXNET-378] Adding depth_to_space and space_to_depth operator

* fixed lint and windows CPU errors

* compliance with C++ style guiide and address shortcomings in unittests

* fixed documentation and nitpicky suggestions

* added operator references in API docs and removed inplace optimization support

* Added references in symbol.md and ndarray.md. Improved test cases and added block_size check

* Fixing bugs in documentation. Tests now include tensors of random shapes.
* fix ctc_loss GPU bug

* add blank_label parameter for CTCLoss

* Revert "add blank_label parameter for CTCLoss"

This reverts commit aab11f7575580f88f5f27be14466d0deb4b4c456.
* add more ops

* use dict.get

* add list comprehensive

* retrigger CI due to unrelated flaky test failure
* Replace cublassgemm with cublassgemmex for >= 7.5

* Add comment for cublassgemmex
* Remove fixed seed for test_sparse_nd_save_load

* Add comments related to the commit
Corrected a race condition with stopping profiling. Added mx.nd.waitall to ensure all operations have completed, including GPU operations that might otherwise be missing.

Also added alternative code for context selection GPU vs CPU, that had error before on machines with nvidia-smi.
* fix bugs and improve tutorial

* improve logging

* update benchmark_score

* Update float16.md

* update link to dmlc web data

* fix train cifar and add random mirroring

* set aug defaults

* fix whitespace

* fix typo
* adding param for list of tags to display on website

* using new website display argument for artifact placement in version folder

* adding display logic

* remove restricted setting for testing

* update usage instructions

* reverted Jenkinsfile to use restricted nodes
* Update relative paths pointing to the data directory to point to the
  correct place in the testing temporary folder.

* Enable the notebooks that were previously broken because of relative
  file paths not pointing to the correct place.

* Move some notebooks we do not plan to test to the whitelist. These
  notebooks are not published in the Straight Dope book.

* Clean-up: Convert print statements to info/warn/error logging
  statements. Add some logging statements for better status.
* add linux and macos doc

* update doc

* Update MKL_README.md

* Update MKL_README.md

Add convolution code to verify mkldnn backend

* add homebrew link

* rename to MKLDNN_README

* add mkl verify

* trigger

* trigger

* set mac complier to gcc47

* add VS2017 support experimentally

* improve quality

* improve quality

* modify mac build instruction since prepare_mkldnn.sh has been rm

* trigger

* add some improvement
* add changes to example

* place the file to the util

* add retry scheme

* fix the retry logic

* change the DownloadUtil to Util

* Trigger the CI
…req='add' (#11338)

* Add tests that fail due to issue 11241

* Fix #11241 Conv1D throws CUDNN_STATUS_EXECUTION_FAILED

* Force algo 1 when grad_req==add with large c.  Expand tests.

* Shorten test runtimes.
…ning with Gluon (#11910)

* Add description about update on kvstore

* add async check for gluon

* only raise error if user set update_on_kvstore

* fix condition

* add async nightly test

* fix case when no kvstore

* add example for trainer creation in doc
ankkhedia and others added 21 commits August 1, 2018 16:34
* fix install instructions for MXNET-R

* fix install instructions for MXNET-R

* fix default cuda version for MXNet-R
* add xavier initializer

* remove comment line
….data_dir() (#11636)

* set MXNET_DATA_DIR as base for downloaded models through base.data_dir()
push joblib to save containers so is not required when running

* MXNET_DATA_DIR -> MXNET_HOME
* put force load back as a temporary solution

* use project.basedir as relative path for OSX linker
* use assert_almost_equal, increase rtol, reduce matrix size

* remove seed in test_bind

* add seed 0 to test_bind, it is still flaky

* add comments for tracking
… (#11808)

* remove mod from arity 2 version of load-checkpoint

* load-checkpoint arity 2 test
* fix broken link

* fix broken link

* switch to .md links

* fix broken link
* Added tolerance level for assert_almost_equal for MBCC

* Nudge to CI
* Windows scripted build
Adjust Jenkins builds to use ci/build_windows.py

Issues:

    #8714
    #11100
    #10166
    #10049

* Fix bug

* Fix non-portable ut

* add xunit
array and multiply are undefined. Importing them from
ndarray
* Remove fixed seed in flaky test

* Remove fixed seed in flaky test

* Update random seed to reproduce the issue

* Fix Flaky unit test and add a training test

* Remove fixed seed in flaky test

* Update random seed to reproduce the issue

* Fix Flaky unit test and add a training test

* Increase accuracy check
@cconvey
Copy link
Contributor Author

cconvey commented Aug 17, 2018

I've performed the following manual checks on the PR's code changes:

  • I reviewed the diff of origin/master vs. this PR's code.
  • I review the diff of Incubator's 8d4d5f commit vs. this PR's code.

I didn't spot any problematic code changes from either perspective.

@mbrookhart mbrookhart merged commit cfbcdab into master Aug 17, 2018
mbrookhart pushed a commit that referenced this pull request Aug 17, 2018
mbrookhart pushed a commit that referenced this pull request Aug 17, 2018
@mbrookhart mbrookhart deleted the cconvey/merge-some-upstream branch August 17, 2018 23:37
ashokei pushed a commit that referenced this pull request Oct 24, 2018
julia> copy(1:4, mx.cpu())
4 mx.NDArray{Int64,1} @ CPU0:
 1
 2
 3
 4

julia> copy(1.:4, mx.cpu())
4 mx.NDArray{Float64,1} @ CPU0:
 1.0
 2.0
 3.0
 4.0
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.