-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Do not merge] This PR is for check the fix in onDNN. #19259
Commits on Mar 3, 2020
-
bump up 1.x branch to 1.7.0 (apache#17741)
* bump up 1.x branch to 1.7.0 * bump version for clojure
Configuration menu - View commit details
-
Copy full SHA for 91d595a - Browse repository at this point
Copy the full SHA 91d595aView commit details
Commits on Mar 6, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 3b83cd8 - Browse repository at this point
Copy the full SHA 3b83cd8View commit details
Commits on Apr 4, 2020
-
[Website 2.0] Nightly Build for v1.x (apache#17956)
* Using unrestricted * Drop publish step * Enable restricted nodes * Reverted website_full, added website_nightly * Reduced node labels to utility and linux_cpu
Configuration menu - View commit details
-
Copy full SHA for 21fc103 - Browse repository at this point
Copy the full SHA 21fc103View commit details
Commits on Apr 10, 2020
-
Configuration menu - View commit details
-
Copy full SHA for db93398 - Browse repository at this point
Copy the full SHA db93398View commit details
Commits on Apr 14, 2020
-
Workaround gnu_tls handshake error on Ubuntu 14.04 Nvidia Docker (apa…
…che#18044) Backport of apache#18018
Configuration menu - View commit details
-
Copy full SHA for 0d3aa67 - Browse repository at this point
Copy the full SHA 0d3aa67View commit details
Commits on Apr 15, 2020
-
[v1.x] Backport apache#17702 and apache#17872 to v1.x branch (apache#…
…18038) * Support projection feature for LSTM on CPU (Only Inference) (apache#17702) * Support projection feature for LSTM on CPU * test solution for -Werror=maybe-uninitialized * Check device type when create state * Document the projection feature of LSTM for RNN operator * Minor fix * Re-run CI * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 (apache#17872) * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 * Use nd.copy() to initialize parameters of new operator * Add check for output states * Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size * Split fused rnn layer test into tests of individual mode * Skip lstm and gru tests on CPU context without DNNL
Configuration menu - View commit details
-
Copy full SHA for 6fa374b - Browse repository at this point
Copy the full SHA 6fa374bView commit details -
[mkldnn]Mkldnn bn opt backport from master to 1.7x (apache#18009)
* optimize for backward batchnorm * using memcpy instead of 'for' loop * rm unnecessary pointer cast and add const for some variable * trigger CI
Configuration menu - View commit details
-
Copy full SHA for 50d6d7d - Browse repository at this point
Copy the full SHA 50d6d7dView commit details -
[v1.x] Update 3rdparty/mkldnn remote URL and pin to v1.3 (apache#17972)…
… (apache#18033) * Update 3rdparty/mkldnn remote URL and pin to v1.3 (apache#17972) * update onednn remote url * checkout onednn v1.3 release * fix format test * make test Conflicts: .gitmodules 3rdparty/mkldnn tests/cpp/operator/mkldnn_test.cc * build flag * upgrade cmake
Configuration menu - View commit details
-
Copy full SHA for 2cf7219 - Browse repository at this point
Copy the full SHA 2cf7219View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3f920ae - Browse repository at this point
Copy the full SHA 3f920aeView commit details -
GPU gemms true fp16 (apache#17466) (apache#18023)
* Temporal solution for fp16 accumulation in Bert gemms * Resolve alpha/beta type issue * add documentation for env variable MXNET_FC_TRUE_FP16 * Improve description of env variable * Add unitest checking environment variable * keep pseudo-fp16 if architecture does not support Float16Compute * Fix cpplint
Configuration menu - View commit details
-
Copy full SHA for 2ccbcec - Browse repository at this point
Copy the full SHA 2ccbcecView commit details
Commits on Apr 16, 2020
-
[1.7] Backport MXNet Extension PRs (apache#17623, apache#17569, apach…
…e#17762) apache#18063 (apache#18069) * Dynamic subgraph compile support (apache#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (apache#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (apache#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project apache#15921 and apache#17270 Co-authored-by: guanxinq <[email protected]> Co-authored-by: Ziyi Mu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1afdfce - Browse repository at this point
Copy the full SHA 1afdfceView commit details -
[v1.x] backport apache#17900 "[MKLDNN] support using any format in po…
…oling backward" (apache#18067) * [MKLDNN] support using any format in pooling backward (apache#17900) * use any format in pooling backward * use data_type() * fix backport
Configuration menu - View commit details
-
Copy full SHA for b56571d - Browse repository at this point
Copy the full SHA b56571dView commit details -
No tensor cores for fp32 interleaved attention, remove div by 8 restr…
…iction (apache#17994) (apache#18085) (cherry picked from commit afae030)
Configuration menu - View commit details
-
Copy full SHA for 8cfc64a - Browse repository at this point
Copy the full SHA 8cfc64aView commit details
Commits on Apr 17, 2020
-
refactor codes and add an option to skip/check weight's version to re…
…duce overhead (apache#17707) (apache#18039)
Configuration menu - View commit details
-
Copy full SHA for 2e22b5e - Browse repository at this point
Copy the full SHA 2e22b5eView commit details
Commits on Apr 18, 2020
-
Add gelu fuse ops (apache#18082) (apache#18092)
* Add LeakyReLU:Gelu (fwd and bwd) to fused ops * Add test LeakyReLU:gelu * cpplint * fix lint * fix bug SQRT_2 using constant memory * add comments
Configuration menu - View commit details
-
Copy full SHA for 3835139 - Browse repository at this point
Copy the full SHA 3835139View commit details -
Cherry-pick of apache#17995 and apache#17937 to 1.x branch (apache#18041
) * Fix ElemwiseSum for more than 4 inputs (apache#17995) * Fix ElemwiseSum for more than 4 inputs * Added test * Fix for handling negative indices in the fusion of slice (apache#17937) * Fix for handling of negative axis, begin and end in fusion of slice ops * Added test
Configuration menu - View commit details
-
Copy full SHA for 814530d - Browse repository at this point
Copy the full SHA 814530dView commit details -
[v1.x] Backport apache#17689 and apache#17884 to v1.x branch (apache#…
…18064) * [MKLDNN] apply MKLDNNRun to quantized_act/transpose (apache#17689) * apply MKLDNNRun to quantized_act/transpose ops * run CI * [MKL-DNN] Integrate Conv3d and Pool3d/1d (apache#17884) * Integrate MKl-DNN conv3d and pool3d/1d * fix UT & address comments * clean code * rebase against latest master * fix conflicts * fix CI * rebase
Configuration menu - View commit details
-
Copy full SHA for 4cf2ad3 - Browse repository at this point
Copy the full SHA 4cf2ad3View commit details -
Fix and optimize handling of vectorized memory accesses (apache#17767) (
apache#18095) * Vectorized loads for binary elemwise kernel * More generalization * Add backwardusenone * Remove the unused _backward_add op * Add vectorized backwardusein * Extending vectorization to more binary ops, binary ops with scalar and unary ops * Handling ElementwiseSum * Get rid of half2 in mshadow * Remove backward_elemwiseaddex * Revert "Remove the unused _backward_add op" This reverts commit f86da86. * Revert "Remove backward_elemwiseaddex" This reverts commit 7729114. * Add back the backward_add since C++ test relies on it * Test bcast implementations * First version of vecotrized bcast * Adding single side vectorized bcast kernel * Removing debug prints * Actually run the single side kernel * Move the default implementation of bcast to the vectorized one * Limit the new implementation to GPU only * Enabling vectorization when broadcast does not actually do broadcast * Cleaning * Cleaning part 2 * Fix for numpy ops using stuff from broadcast * Fix * Fix lint * Try to debug pinv numpy test * Fix * Fix the vectorized broadcast implementation for misaligned input pointers * Added tests * Added docs to cuda_vectorization.cuh * Another fix for broadcast and fix INT64 compilation * Optimize for aligned=true * 1 more addition to test * Reverting the change to Numpy op test * Trying mcmodel=medium to fix the failure in CMake static build * Revert "Trying mcmodel=medium to fix the failure in CMake static build" This reverts commit 1af684c. * Limiting the PR to just elementwise ops
Configuration menu - View commit details
-
Copy full SHA for b8e8d73 - Browse repository at this point
Copy the full SHA b8e8d73View commit details
Commits on Apr 21, 2020
-
Configuration menu - View commit details
-
Copy full SHA for a5744be - Browse repository at this point
Copy the full SHA a5744beView commit details
Commits on Apr 22, 2020
-
MXNet Extensions enhancements (apache#17885) (apache#18126)
* add debug prints to debug error in CI * add debug prints to debug error in CI * remove prints * initial commit * enabled calling create for selector * connected selector to call external class * added code to remove temp graph attrs * fixed build issues * changed shape inference to use different attr names * fixed selector class * cleaned up APIs * fixed sanity * updated build for extensions * sanity fix * refactored MXLoadLib into separate functions * undo rebase * finished merge * enabled verbose in library loading * fixed example * added passing args/aux down to graph pass * added creating new args/aux for graph passes * fixed return args/aux * fixed sanity * whitespace * fixed lint * updated perl API, README, added pass_lib to cmake build flow * fixed mistake with relu example lib * fixed perl syntax * addressed comments * addressed more comments * fixed compile issues Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 18c7963 - Browse repository at this point
Copy the full SHA 18c7963View commit details
Commits on Apr 24, 2020
-
[1.x] Fix incorrect calculation results when the C locale is set to a…
… locale that uses commas as the decimal separator (apache#17177) * Add a test for floating point parsing locale invariance * Use locale-invariant dmlc:stod/stof instead of std:stod/stof * Change the new operator tutorial to use dmlc:stod instead of std::stod * Rename locale invariance test * Skip test_scalarop_locale_invariance if the locales aren't available * Fix linter errors due to incorrect include order
Configuration menu - View commit details
-
Copy full SHA for 770d49e - Browse repository at this point
Copy the full SHA 770d49eView commit details -
Update Apache License for mshadow (apache#18109) (apache#18133)
* Add Apache License for mshadow * update cpp-package license * udpate license for mx-theme in top-level LICENSE * Enable RAT License check for mshadow, and keep the rest of 3rdparty unchanged. * add license header
Configuration menu - View commit details
-
Copy full SHA for f765e8a - Browse repository at this point
Copy the full SHA f765e8aView commit details -
[v1.x] Backport staggered CI builds (apache#17999 & apache#18119) (ap…
…ache#18141) * For mxnet-validation pipeline, require sanity build to complete successfully before running other build pipelines. (apache#17999) * Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds. * Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future. Co-authored-by: Joe Evans <[email protected]> * If sanity build is not found, wait until Jenkins recognizes it. (apache#18119) * If sanity build is not found, wait until Jenkins recognizes it. * Also add a timeout of 30m for sanity build to run and complete, so we don't get stuck in a loop. Co-authored-by: Joe Evans <[email protected]> Co-authored-by: Joe Evans <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0e7dd91 - Browse repository at this point
Copy the full SHA 0e7dd91View commit details
Commits on Apr 26, 2020
-
add logic for no batch size while getting data arrays from executors (a…
…pache#17772) (apache#18075) Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 63e2b19 - Browse repository at this point
Copy the full SHA 63e2b19View commit details
Commits on May 11, 2020
-
Fix pylint astroid sanity issue (apache#18275)
* Fix pylint astroid sanity issue Cherrypick apache@18e2014 from apache#18220
Configuration menu - View commit details
-
Copy full SHA for 38ec873 - Browse repository at this point
Copy the full SHA 38ec873View commit details
Commits on May 26, 2020
-
[v1.x] Backport edge pipeline (apache#18375)
* Update edge toolchain * Support platforms without rand_r * Fix the URL to the IUS repository * compiler warnings * Use a pre-c++17 way of distinguishing between device types * Greatly simplify qemu setup * Request the C++ standard library and extensions * Upgrade dmlc-core to resolve build errors * Remove leftovers from C++17 dev type check * Fix CPU-only RRNOp Forward * Change the ARM8 build to work like the ARM7 build * Revert "Fix CPU-only RRNOp Forward" This reverts commit 0a921a4. * Hack around the lack of constexpr if * Adjust the list of files to be packed in ARM jobs Co-authored-by: Leonard Lausen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fe90008 - Browse repository at this point
Copy the full SHA fe90008View commit details
Commits on May 27, 2020
-
Fix memory leaks in Gluon (apache#18328) (apache#18359)
Fix leak of ndarray objects in the frontend due to reference cycle. Backport of 3e676fc
Configuration menu - View commit details
-
Copy full SHA for b523527 - Browse repository at this point
Copy the full SHA b523527View commit details
Commits on May 28, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 0c6785f - Browse repository at this point
Copy the full SHA 0c6785fView commit details
Commits on May 29, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 6bcfce9 - Browse repository at this point
Copy the full SHA 6bcfce9View commit details -
[1.x] Pass args fix2 (apache#18236)
* fixed overwrite of args/aux variables * fixed spacing * Merged apache#18177 * updated python RPM URL
Configuration menu - View commit details
-
Copy full SHA for ac3e71b - Browse repository at this point
Copy the full SHA ac3e71bView commit details -
Revert PR 17767 for fixing GPU memory usage regression (apache#18283) (…
…apache#18309) * Revert "Fix and optimize handling of vectorized memory accesses (apache#17767)" This reverts commit 5542d03. * add license to reverted file
Configuration menu - View commit details
-
Copy full SHA for d621e50 - Browse repository at this point
Copy the full SHA d621e50View commit details -
Fix PyPI Packages and Python Docker Images nightly release (apache#18222
) (apache#18432) * remove OS from s3 library path * fix bash script run commands * Revert "remove OS from s3 library path" This reverts commit 2665113. * hardcode s3 path for upload/download of binaries Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3ee8e00 - Browse repository at this point
Copy the full SHA 3ee8e00View commit details
Commits on Jun 2, 2020
-
[1.x] Add BatchNormWithReLU fp32/bf16 (apache#18160)
* add bnrelu bf16 into amp list * [MKL-DNN] BatchNormRelu Fusion (apache#17679) * support bnrelu fusion * add param to gluon * move to contrib * fix gluon interface * reuse bn param * fix * fix forward * cache flags * fix lint * fix ut * trigger * trigger * trigger * inherit from BN base * fix lint Co-authored-by: no <[email protected]> * Remove chinese period which leads to utf-8 encoding problem (apache#18223) * add bnrelu amp test * trigger ci Co-authored-by: no <[email protected]> Co-authored-by: damNull <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 843f278 - Browse repository at this point
Copy the full SHA 843f278View commit details
Commits on Jun 3, 2020
-
[v1.x] Backport of improve log_softmax op performance by using DNNL s…
…upport (apache#18320) (apache#18469) * Improve log_softmax performance by OneDNN library * Adapt tests for MKLDNN log_softmax * Fix lint errors * Fix indent and comments
Configuration menu - View commit details
-
Copy full SHA for a64825f - Browse repository at this point
Copy the full SHA a64825fView commit details -
fix batchnorm (apache#18377) (apache#18470)
Update basic_layers.py fix fix Update basic_layers.py fix bug Co-authored-by: Xingjian Shi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 36bd144 - Browse repository at this point
Copy the full SHA 36bd144View commit details -
[1.x] Backport of LSTM and GRU fix (apache#17898) and RNN op (apache#…
…17632) (apache#18317) * [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8986e3f - Browse repository at this point
Copy the full SHA 8986e3fView commit details
Commits on Jun 8, 2020
-
Configuration menu - View commit details
-
Copy full SHA for de481d3 - Browse repository at this point
Copy the full SHA de481d3View commit details
Commits on Jun 9, 2020
-
[v1.x] backport apache#18500 - [Bug Fixed] Fix batch norm when grad_r…
…eq is `add` (apache#18518) * Fix batch norm when grad_req is * fix * remove softmax test * fix
Configuration menu - View commit details
-
Copy full SHA for 798a264 - Browse repository at this point
Copy the full SHA 798a264View commit details
Commits on Jun 15, 2020
-
[v1.x] Cherry-pick apache#17776 apache#17681 (apache#18465)
* Fix CD (apache#17776) * Fix cd/mxnet_lib/dynamic/Jenkins_pipeline.groovy Fixes a regression in CD introduced by apache#17645 * Fix whitespace * Add NATIVE_ADDITIONAL.md Fixes a regression in CD introduced by apache#16899 * Update other $TYPE_ADDITIONAL.MD * Fix cd/python/docker Fixes regression introduced by apache#15990 * [CD] update pypi description, setup.py (apache#17681) * update pypi description, setup.py, use manylinux2010, use unified dist link for nightly * Use manylinux2014 Co-authored-by: Leonard Lausen <[email protected]> * reverting .so path as per MAKE flow Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3b2c9ad - Browse repository at this point
Copy the full SHA 3b2c9adView commit details -
Update Jetson installation guide (apache#18485) (apache#18557)
* add config Makefile for jetson * modify jetson install guide
Configuration menu - View commit details
-
Copy full SHA for 375b49f - Browse repository at this point
Copy the full SHA 375b49fView commit details
Commits on Jun 16, 2020
-
Increase staggered build timeout to 180 min (apache#18568)
* Increase staggered build timeout to 180 min, since sanity build has 180 min timeout. * Decrease timeout so everyone is happy. Co-authored-by: Joe Evans <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f91b989 - Browse repository at this point
Copy the full SHA f91b989View commit details
Commits on Jun 18, 2020
-
Configuration menu - View commit details
-
Copy full SHA for f563fa4 - Browse repository at this point
Copy the full SHA f563fa4View commit details
Commits on Jul 1, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 16144ff - Browse repository at this point
Copy the full SHA 16144ffView commit details
Commits on Jul 2, 2020
-
Enhance license checker to cover multiple license header and md files (…
…apache#18634) Co-authored-by: Leonard Lausen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 366a7f8 - Browse repository at this point
Copy the full SHA 366a7f8View commit details
Commits on Jul 9, 2020
-
[v1.x] Backport of Fix BatchNorm backward synchronization (apache#18644…
…) (apache#18654) * Add test for BatchNorm running variables synchronization * Fix BatchNorm backward synchronization It fixes issue apache#18610
Configuration menu - View commit details
-
Copy full SHA for 024daa6 - Browse repository at this point
Copy the full SHA 024daa6View commit details
Commits on Jul 15, 2020
-
Fix the monitor_callback invalid issue during calibration with variab…
…le input shapes (apache#18705)
Configuration menu - View commit details
-
Copy full SHA for 5cdefeb - Browse repository at this point
Copy the full SHA 5cdefebView commit details
Commits on Jul 23, 2020
-
[v1.x] Cherrypick apache#18677 apache#18713 (apache#18742)
* Migrate from private to public jetson toolchain files (apache#18677) * Set CMAKE_CUDA_COMPILER in aarch64-linux-gnu-toolchain.cmake (apache#18713) CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826 Co-authored-by: Leonard Lausen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b81c1ce - Browse repository at this point
Copy the full SHA b81c1ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 91d535a - Browse repository at this point
Copy the full SHA 91d535aView commit details
Commits on Jul 24, 2020
-
Fix linalg_potri and linalg_potrf operators for large tensor. (apache…
…#18752) * Fix linalg_potri operator for large tensor. * Update other variables to support large tensors. * Add to contributors. * Fix whitespace. * Update ZeroTriangular to support large tensors. * Add large tensor unit tests for linalg_potrf and linalg_potri. * Fix crash when accessing already destructed static variables (apache#18768) (apache#18778) Co-authored-by: Joe Evans <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e6de5ae - Browse repository at this point
Copy the full SHA e6de5aeView commit details -
Add Large Tensor Test for linalg_syrk (apache#18782)
* add large tensor test for syrk, foward and backward * change to batch input * move syrk test into test-linalg Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 85ff00d - Browse repository at this point
Copy the full SHA 85ff00dView commit details
Commits on Jul 27, 2020
-
[v1.x] add large matrix tests for linalg ops: det, inverse, trsm, trmm (
apache#18744) * add linalg large matrix tests * add batch inputs linalg tests * reducing bsize to 1 to save time * move matrix generator to utils * passing mat size as arg * import util fn * fix sanity * add mx * call backward * merge fn * update grad value * refactor tests * add mx * add shape check Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 566d9d3 - Browse repository at this point
Copy the full SHA 566d9d3View commit details
Commits on Jul 28, 2020
-
[1.x][LT] Add forward, backward test for linalg.gemm2 (apache#18784)
* added forward, backward test for gemm2 * add backward check * correct gradient assert * move test inside linalg_ops * add shape checks
Configuration menu - View commit details
-
Copy full SHA for d009345 - Browse repository at this point
Copy the full SHA d009345View commit details -
Back port optimization to broadcast_axis to MXNet1.x (apache#18773)
* Improving performance of broadcast_axis on GPU (apache#18168) * adding separate int32_t kernel for GPU in broadcast_axis/to/like operators * using structure instead of temp workspace to pass stride and shape * replacing hardcoded int32_t with generic index_t * combining CPU and GPU kernels to leverage cached stride calculation and fast access shape data in both Co-authored-by: Rohit Kumar Srivastava <[email protected]> * Improve performance of broadcast_axis on CPU (apache#17882) * adding comments explaining code optimizations * fixing broadcast_axis kernel to int32 * fixing slice_axis kernel to int32 * combining CPU and GPU implementation method signatures and cleaned up code * adding new broadcast_axis to np_matmul Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7bef9cb - Browse repository at this point
Copy the full SHA 7bef9cbView commit details
Commits on Jul 29, 2020
-
Add syrk test shape check (apache#18812)
* add shape check * add name to contributor.md Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 85eb528 - Browse repository at this point
Copy the full SHA 85eb528View commit details -
adding error message when attempting to use Large tensor with linalg_…
…syevd (apache#18807) Co-authored-by: Rohit Kumar Srivastava <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ca6bcf3 - Browse repository at this point
Copy the full SHA ca6bcf3View commit details
Commits on Jul 30, 2020
-
[v1.x][LT] Add forward & backward linalg.gemm test for large size (ap…
…ache#18825) * add test for linalg.gemm * fix indents
Configuration menu - View commit details
-
Copy full SHA for 84c9e0d - Browse repository at this point
Copy the full SHA 84c9e0dView commit details
Commits on Jul 31, 2020
-
Add Large Dim Checks for linalg Operators (apache#18816)
* initial * test * gemm and gemm2 * type fix * syrk trmm trsm * gelqf * move tests from test_large_array.py to test_large_vector.py * fix white space issue Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f4e62df - Browse repository at this point
Copy the full SHA f4e62dfView commit details -
Add unit tests for potri and potrf backward and check output shape in…
… unit tests. (apache#18803) Co-authored-by: Joe Evans <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1a31cea - Browse repository at this point
Copy the full SHA 1a31ceaView commit details
Commits on Aug 3, 2020
-
[v1.x Backport] Fix softmax, logsoftmax failed on empty ndarray (apac…
…he#18602) (apache#18708) * [v1.x] Backport of fix npx.softmax for 0-sized inputs (apache#18158) Co-authored-by: Hao Jin <[email protected]> * Fix softmax, logsoftmax failed on empty ndarray (apache#18602) * Fix failing empty array (log_)softmax * Modify test for npx (log_)softmax * Fix softmax, logsoftmax backward failed on empty ndarray (apache#18710) Co-authored-by: Yiyan66 <[email protected]> Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Bart Gawrych <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 73d3a7b - Browse repository at this point
Copy the full SHA 73d3a7bView commit details -
[ONNX export] Fixing spatial export for batchnorm (apache#17711) (apa…
…che#18846) * fixing spatial export for batchnorm * retrigger CI * fixing broken pylint * retrigger build * deprecating spatial attribute in exporter so default behavior of spatial=1 is conveyed Co-authored-by: Vinitra Swamy <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5db8dee - Browse repository at this point
Copy the full SHA 5db8deeView commit details -
[v1.x] Mkldnn header fix v1x for nightly binaries (apache#18797)
* Cherry-pick apache#18310 apache#18355 (apache#18608) * cherry-pick: Fix missing MKLDNN headers (apache#18310) * Include all mkldnn headers in CD builds (apache#18355) * Fix cmake mkldnn install target. Previously mkldnn headers are installed to CMAKE_INSTALL_INCLUDEDIR instead of CMAKE_INSTALL_INCLUDEDIR/mkldnn * Fix pypi_package.sh pip/setup.py for mkldnn builds * Set CMAKE_CUDA_COMPILER in aarch64-linux-gnu-toolchain.cmake (apache#18713) CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826 Co-authored-by: Leonard Lausen <[email protected]> * remove linux-gputoolchain Co-authored-by: MoisesHer <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2aa2702 - Browse repository at this point
Copy the full SHA 2aa2702View commit details
Commits on Aug 4, 2020
-
Configuration menu - View commit details
-
Copy full SHA for eae6171 - Browse repository at this point
Copy the full SHA eae6171View commit details
Commits on Aug 8, 2020
-
Configuration menu - View commit details
-
Copy full SHA for cc287a0 - Browse repository at this point
Copy the full SHA cc287a0View commit details
Commits on Aug 10, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 0b5b449 - Browse repository at this point
Copy the full SHA 0b5b449View commit details
Commits on Aug 11, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 1711103 - Browse repository at this point
Copy the full SHA 1711103View commit details
Commits on Aug 12, 2020
-
Fix CI in v1.x branch (apache#18907)
* Update mirror for getting binutils source. * Remove erroneous wget command and duplicate mkdir command. Co-authored-by: Joe Evans <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9fbf3d3 - Browse repository at this point
Copy the full SHA 9fbf3d3View commit details
Commits on Aug 13, 2020
-
Configuration menu - View commit details
-
Copy full SHA for d2d6408 - Browse repository at this point
Copy the full SHA d2d6408View commit details
Commits on Aug 14, 2020
-
[v1.7.x] backport Invoke mkldnn and cudnn BatchNorm when axis != 1 t…
…o v1.7.x (apache#18676) (apache#18890) * [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (apache#18504) * fix batch norm when fix_gamma is True * support gradient accumulation for batch norm * mkldnn batchnorm support grad add * unittest for bn * fix bn arg * fix lint * fix mkldnn * fix mkldnn bn * fix grad when fixing gamma * fix naive gpu bn * fix lint * invoke mkldnn and cudnn batchnorm when axis != 1 * backport 18500 * change condition * fix * fix * add mkldnn_off for bn * remove mkldnn_off * recover save_000800.json * cast * remove and fix flaky test Co-authored-by: JackieWu <[email protected]> Co-authored-by: JackieWu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d32ba4f - Browse repository at this point
Copy the full SHA d32ba4fView commit details -
Backporting backward inference from 2.x apache#18348 and apache#18378 (…
…apache#18895) Signed-off-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6b568fd - Browse repository at this point
Copy the full SHA 6b568fdView commit details
Commits on Aug 17, 2020
-
Cherry-pick apache#18635 to v1.7.x (apache#18935) (apache#18945)
* Remove mention of nightly in pypi (apache#18635) * update bert dev.tsv link Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Carin Meier <[email protected]> Co-authored-by: Sheng Zha <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5e0db7a - Browse repository at this point
Copy the full SHA 5e0db7aView commit details
Commits on Aug 18, 2020
-
fix gelu to use erf based algorithm (apache#18827) (apache#18946)
Co-authored-by: Tao Lv <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6ae469a - Browse repository at this point
Copy the full SHA 6ae469aView commit details -
[CI][1.x] Cherrypick: Upgrade unix gpu toolchain (apache#18186) (apac…
…he#18785) * Update unix gpu toolchain (apache#18186) * update nvidiadocker command & remove cuda compat * replace cu101 with cuda since compat is no longer to be used * skip flaky tests * get rid of ubuntu_build_cuda and point ubuntu_cu101 to base gpu instead of cuda compat * Revert "skip flaky tests" This reverts commit 1c720fa. * revert removal of ubuntu_build_cuda * add linux gpu g4 node to all steps using g3 in unix-gpu pipeline * remove docker compose files * add back the caffe test since caffe is deprecated for mx2.0 and not 1.x * drop nvidia-docker requirement since docker19.0 supports it by default :q * remove compat from dockerfile * Cherry-pick apache#18635 to v1.7.x (apache#18935) * Remove mention of nightly in pypi (apache#18635) * update bert dev.tsv link Co-authored-by: Sheng Zha <[email protected]> * disable tvm in CI functions that rely on libcuda compat * tvm off for ubuntu_gpu_cmake build * drop tvm from all unix-gpu builds Co-authored-by: Carin Meier <[email protected]> Co-authored-by: Sheng Zha <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9981e84 - Browse repository at this point
Copy the full SHA 9981e84View commit details -
[1.x] Backporting apache#18779 to v1.x (apache#18894)
* initial commit * Support extra inputs for subgraph ops (apache#18779) Support additional inputs to custom subgraph ops that are not direct dependencies to ops in the subgraph. This will enable various use cases: custom control flow ops, custom ops that maintain a state that should be saved/loaded, etc. Highlights: * Added test that uses a graph pass (addInputPass) to add a new custom input to the subgraph op * Added new optional argument (clear) to hybridize & optimize_for APIs in Gluon Block to enable multiple optimizations * refactored lib_api.h JSON utilities * added new Graph data structure utilities to simplify custom graph passes * refactored custom op registration * enhanced custom subgraph op to support additional inputs to subgraph op that is not an input to ops in the subgraph * updated subgraph & graph pass READMEs * Added error messaging from external library * changed messages * changed to pointers and types * added cast * updated cast * fixed signed int * whitespace * fixd pass resource Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d1ac7c8 - Browse repository at this point
Copy the full SHA d1ac7c8View commit details
Commits on Aug 19, 2020
-
Configuration menu - View commit details
-
Copy full SHA for b4da2dd - Browse repository at this point
Copy the full SHA b4da2ddView commit details -
[1.x] Backporting TensorRT-Gluon Partition API (and TensorRT 7 suppor…
…t) (apache#18916) * [1.x] Backporting TensorRT and Gluon changes Signed-off-by: Serge Panev <[email protected]> * Remove test from Jenkins Signed-off-by: Serge Panev <[email protected]> * Fix test Signed-off-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8dcc653 - Browse repository at this point
Copy the full SHA 8dcc653View commit details
Commits on Aug 20, 2020
-
Backport: Change Partition API's options_map to std::unordered_map ap…
…ache#18929 (apache#18964) Signed-off-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9445a2d - Browse repository at this point
Copy the full SHA 9445a2dView commit details
Commits on Aug 24, 2020
-
Get rid of monkey patching in LossScaler overflow handling (apache#18959
) (apache#18973) Co-authored-by: Vladimir Cherepanov <[email protected]> Co-authored-by: Vladimir Cherepanov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dfefe87 - Browse repository at this point
Copy the full SHA dfefe87View commit details
Commits on Aug 26, 2020
-
[1.x] Backport of Fix LeakyRelu behaviour on empty input (apache#18934)…
… (apache#19009) * Fix LeakyRelu behaviour on empty input * Remove duplicated declarations
Configuration menu - View commit details
-
Copy full SHA for bce4cc6 - Browse repository at this point
Copy the full SHA bce4cc6View commit details
Commits on Sep 1, 2020
-
1.x: Stop packaging GPL libquadmath.so (apache#19055)
* Stop packaging GPL libquadmath.so (apache#19053) libquadmath.so is GPL and must not be distributed by Apache projects. Users will need to ensure that libquadmath.so is present on their systems if they use binary builds of MXNet. libquadmath.so has not yet undergone any ABI changes, thus all versions of libquadmath.so are ABI compatible and user just needs to install system version of libquadmath.so. libgfortran.so can be packaged thanks to GCC Runtime Library Exception. See https://www.apache.org/legal/resolved.html#category-x * Remove unmaintained pip packages * Workaround pypa/setuptools#2352
Configuration menu - View commit details
-
Copy full SHA for d144edd - Browse repository at this point
Copy the full SHA d144eddView commit details
Commits on Sep 3, 2020
-
Support for fp16 in SpM x DnsM on GPU (apache#18930) (apache#19074)
Backported apache#18930 * Support for fp16 in SpGeMM * adding test for GPU spmm Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1a398e5 - Browse repository at this point
Copy the full SHA 1a398e5View commit details -
[1.x] Backporting apache#19016 (apache#19069)
* initial commit * fixed c++17 downgrade * fixed stringstream * fixed cast * changed to use pointers for stringstream since not copyable * fixed includes * fixed makefile includes * skipped lint for malloc/free for passing across C ABI Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b5e9c99 - Browse repository at this point
Copy the full SHA b5e9c99View commit details -
fix block.export (apache#17970) (apache#19075)
This PR cherry-picks commit 5122d32 into the v1.x branch. This is to enable the export of models where dangling layers are optimized out during symbol export. For more information, see here and here.
James Mracek committedSep 3, 2020 Configuration menu - View commit details
-
Copy full SHA for 748eebd - Browse repository at this point
Copy the full SHA 748eebdView commit details
Commits on Sep 7, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 8dbed96 - Browse repository at this point
Copy the full SHA 8dbed96View commit details
Commits on Sep 8, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 5051fd9 - Browse repository at this point
Copy the full SHA 5051fd9View commit details
Commits on Sep 9, 2020
-
empty list cannot be cleared issue fixed. (apache#14882)
* empty list cannot be cleared issue fixed. * Update multiproc_data.py Co-authored-by: Sheng Zha <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a9bd1d2 - Browse repository at this point
Copy the full SHA a9bd1d2View commit details
Commits on Sep 10, 2020
-
Add TRT verbose mode (apache#19100)
Signed-off-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2d077db - Browse repository at this point
Copy the full SHA 2d077dbView commit details
Commits on Sep 11, 2020
-
[v1.x] Update onnx support to work with onnx 1.7.0 with most CV models (
apache#19017) * fix pooling_convention warning when convert model to onnx (apache#18529) * fix pooling_convention warning * fix pooling_convention warning * fix lint Co-authored-by: JackieWu <[email protected]> * Prevent uninitialized variable error. * Initial work to get Dropout to work with onnx 1.7 * Remove trailing whitespace for pylint. * Fix tensor initialization for Dropout operator input. * Update Clip operator to support latest ONNX opset versions by moving min/max attributes to inputs. * Fix whitespace. * Add support for importing Dropout operator in ONNX opset version >= 12. * Add support for import ONNX opsets >= 11 to clip operator. * Add optional opset_version parameter that defaults to latest opset version supported by onnx. Pass this parameter to each graph layer when exporting. * Add optional parameter to create_model() that allows user to specify which onnx opset version they want to use when exporting, defaults to latest version supported by onnx. * Use opset_version argument to determine operator format. * Add a opset_version parameter to from_onnx() so at operator conversion time, we know what opset version to use. * For Clip and Dropout operators, use opset version from passed proto_obj, which reflects what opset version the onnx model uses. * Use same tolerances that are in master. * Change Pad operator to use inputs instead of attributes for newer opset versions. Check opset version instead of ONNX version for Pooling operator. * Add documentation opset_version parameter. * Add opset_version parameters to unit tests. * Add test script for testing inference with onnxruntime on CV models from gluon model zoo. * Add license and clean up imports. * Install onnxruntime in docker container for unit tests. * Add onnxruntime to test dependencies. * Install onnxruntime into CentOS docker image. * Disable testing squeezenet models for now. * Update onnx version. * Fix typo. * Use mx.image.imread instead of PIL module. * ONNX import: use Conv pad attribute for symmetrical padding (apache#18675) Signed-off-by: Serge Panev <[email protected]> * Install onnx in CentOS containers when installing python. * Update import and export of some ONNX ops to support newer opset versions - this gets all ONNX unit tests to pass with onnx 1.7. * Re-enable squeezenet model testings in onnxruntime. * Run the onnxruntime inference tests in the ONNX pipeline instead of normal unittests pipelines. * Add missed return value. * Refactor code based on review comment. * Since the onnx tests are only run on ubuntu_cpu images, we don't need to install onnx and onnxruntime in the CentOS containers. Co-authored-by: Liu, Hao <[email protected]> Co-authored-by: JackieWu <[email protected]> Co-authored-by: Joe Evans <[email protected]> Co-authored-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b888d3c - Browse repository at this point
Copy the full SHA b888d3cView commit details
Commits on Sep 13, 2020
-
Fix race condition in NaiveEngine::PushAsync (apache#19108) (apache#1…
…9122) * Wait for async_fun to complete in NaiveEngine::PushAsync This fixes a race condition in which NaiveEngine::PushAsync was checking if the the async_fun had completed by the end of NaiveEngine::PushAsync. If async_fun hadn't completed yet, NaiveEngine::PushAsync would set an internal error string and deallocate the callback, causing segfault in async_fun once it would attempt calling the callback. * Update naive_engine.cc
Configuration menu - View commit details
-
Copy full SHA for 8b56874 - Browse repository at this point
Copy the full SHA 8b56874View commit details
Commits on Sep 14, 2020
-
[1.x] Backport apache#19103 (apache#19117)
* initial commit * incremented version number Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9383282 - Browse repository at this point
Copy the full SHA 9383282View commit details
Commits on Sep 15, 2020
-
[1.x] Backport 'Update CUB and include it only for CUDA < 11 apache#1…
…8799' (apache#18975) * Update CUB and only for CUDA < 11 apache#18799 and update Makefile Signed-off-by: Serge Panev <[email protected]> * Add preprocessor option to silence CUB C++14 warning Signed-off-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e1bcb33 - Browse repository at this point
Copy the full SHA e1bcb33View commit details -
[1.x] Backport Fix for duplicate subgraph inputs/outputs (apache#16131)…
… (apache#19112) * Fix for duplicate subgraph inputs/outputs (apache#16131) * fix for duplicate inputs * fixed error * fixed whitespace * Remove duplicate outputs from subgraphs * changed subgraph to create map of outputs * added static_cast * changed map<int,v> to vector * sanity fix * sanity2 * updated backends with new connectSubgraphOutputs API * fixed map creation logic * added updates for reattach function * creating node only if it is not an input to subgraph * creating object based on var_name only * updating ConnectSubgraphOutputs for mkldnn_elemwisemul_post_quantize_property.h * add debug prints to debug error in CI * remove prints * added prints to debug in the CI * revert changes * reverted changes * deduplicaated inputs to subgraph * deduplicated subgraph inputs * simplified inputs * cleaned up * deduplicate outputs * cleand up * added deduplication to subgraph node outputs * fixed prev compare * fixed issue with inputs and added test * fixd whitespace, removed prints Co-authored-by: Sam Skalicky <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]> * added flag to enable dedupe ondemand * fixed dedup logic * improved dedup logic * fixed sanity * propogated option * check option in custom subgraph prop * fixed options map * fixed missing * added dedup to subgraph_prop base class for testing * added test for dedup * added comments Co-authored-by: Sam Skalicky <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9dfac79 - Browse repository at this point
Copy the full SHA 9dfac79View commit details -
TensorRT: add int8 with calibration (apache#19011)
Signed-off-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 606933f - Browse repository at this point
Copy the full SHA 606933fView commit details
Commits on Sep 16, 2020
-
[1.x] Backport of intgemm apache#17559 (apache#19099)
* cherry-pick intgemm from master, fix build * Fix test to conform to 1.x * Makefile supporting intgemm compilation * Stricter dependencies on git checkout of intgemm * Operators depend on mkldnn * Don't compile intgemm with gcc older than 5 * Fix intgemm test for windows on 1.x by not using pytest * Update intgemm to use template arguments for integer immediates * Try to fix clang3.6 * Ban gcc < 5 in cmake * Update intgemm with gcc 5.5 debug workaround
Configuration menu - View commit details
-
Copy full SHA for d2e6452 - Browse repository at this point
Copy the full SHA d2e6452View commit details
Commits on Sep 17, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 837c7e4 - Browse repository at this point
Copy the full SHA 837c7e4View commit details -
[1.x] Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (a…
…pache#19123) (apache#19158) * [1.x] Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (apache#19123) * Trigger CI * Appending to existing CMAKE_CUDA_FLAGS in all cases
Configuration menu - View commit details
-
Copy full SHA for 3d9af6e - Browse repository at this point
Copy the full SHA 3d9af6eView commit details -
Fix the error of gradient of np.pad (apache#19044) (apache#19167)
* pad grad modified * Fix pad grad error * modify pad constant backward * Fix test error * Fix test error * Fix kAddTo supported * Add test for grad_req='add' Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: Wentao Xu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 039fef9 - Browse repository at this point
Copy the full SHA 039fef9View commit details -
[v1.x] Add new CI pipeline for building and testing with cuda 11.0. (a…
…pache#19149) * Add new docker containers for Cuda 11.0 and libcudnn8. * Add new functions for running GPU builds and tests in new Cuda11 containers. * Add runtime functions for cuda 11.0 related builds/tests. * Add new pipeline for testing cuda 11.0 builds. * Run cuda11 pipeline when sanity completes. * Use base image that already has libcudnn8 installed from Nvidia. Remove calls to nvidia/cudnn install scripts. * Don't build CPP package for cuda11 build. * Use proper base docker image for testing (include cudnn8) and don't manually install cudnn8. * Re-enable CPP package build. * Add env variable LD_LIBRARY_PATH in the build container so cpp-packagee build works. Remove unneeded components of docker containers to reduce size and build time. * Add sm_80 and compute_80 to compiled cuda architectures. * Add back binutils install since we are building for more cuda architectures and will hit the ar limit. Co-authored-by: Joe Evans <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 620d058 - Browse repository at this point
Copy the full SHA 620d058View commit details -
[v1.x] Backport Unittest tolerance handling improvements (apache#18694)…
…. Also test seeding (apache#18762). (apache#19148) * Add sm arch 80 to Makefile * Unittest tolerance handling improvements (apache#18694) * Add sm arch 80 to Makefile * Add TF32 to cuBLAS GEMMs Signed-off-by: Serge Panev <[email protected]> * Add CUDA version guards Signed-off-by: Serge Panev <[email protected]> * Remove useless TF32 for double and old CUDA version Signed-off-by: Serge Panev <[email protected]> * Factorize VERSION_ADJUSTED_TF32_MATH Signed-off-by: Serge Panev <[email protected]> * Add TF32 considerations to test_util.py:check_consistency() * Bypass test_gluon_gpu.py:test_large_models if gmem >32GB * Default tols in assert_almost_equal() now a function of dtype and ctx * Expand types listed by default_tols() * Fix pylint * All with_seed() tests to waitall in teardown * Elevate MXNET_TEST_SEED logging to WARNING * Revert test_gluon_gpu.py:test_rnn_layer to default tols * Fix test_gluon_model_zoo_gpu.py::test_inference and test_operator_gpy.py::test_np_linalg_{solve,tensorinv} * test_numpy_interoperability.py to not fix seed for rest of CI * Further fix to test_np_linalg_tensorinv * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Fix test_operator_gpu.py::test_embedding_with_type * Fix test_operator_gpu.py::{test_*convolution_large_c,test_np_linalg_tensorsolve} * Remove unneeded print() from test_numpy_interoperability.py * Unify tol handling of check_consistency() and assert_almost_equal(). Test tweeks. * Add tol handling of assert_almost_equal() with number args * Add tol handling of bool comparisons * Fix test_numpy_op.py::test_np_random_rayleigh * Fix test_operator_gpu.py::test_batchnorm_with_type * Fix test_gluon.py::test_sync_batchnorm in cpu selftest * Improve unittest failure reporting * Add to robustness of test_operator_gpu.py::test_embedding_with_type * Check_consistency() to use equal backward gradients for increased test robustness * Fix test_operator_gpu.py::test_{fully_connected,gemm}. Add default_numeric_eps(). * test_utils.py fix for numeric gradient calc * Reinstate rtol=1e-2 for test_operator.py::test_order * Remove auto-cast of check_consistency() input data to least precise dtype (not needed) * Fix test_operator.py::test_{reciprocol,cbrt,rcbrt}_op * Expand default float64 numeric_eps for test_operator_gpu.py::test_sofmin * Fix segfault-on-error of @Retry decorator. Add test isolation. * assert_almost_equal() to handle a,b scalars * Fix test_operator_gpu.py::test_gluon_{mvn,mvn_v1} race * Fix test_operator_gpu.py::test_flatten_slice_after_conv via scale * Remove test_utils.py:almost_equal_ignore_nan() * Fix sample vs. pop variance issue with test_numpy_op.py::test_npx_batch_norm * Expose test_utils.py:effective_dtype() and use to fix test_operator_gpu.py::test_np_linalg_svd * Fix true_divide int_array / int_scalar -> float_array to honor np_default_dtype * Try test_elemwise_binary_ops serial to avoid pytest worker crash * Fix (log_)softmax backward on empty ndarray * Temporarily log all CI seeds to troubleshoot seed non-determinism * Revert "Temporarily log all CI seeds to troubleshoot seed non-determinism" This reverts commit f60eff2. * Temp log all CI seeds to troubleshoot unwanted seed determinism * Revert "Add sm arch 80 to Makefile" This reverts commit f9306ce. * Same fix of sample vs. pop variance issue, now with test_operator_gpu.py::test_batchnorm * Revert "Temp log all CI seeds to troubleshoot unwanted seed determinism" This reverts commit ff328ef. * Marking test_sparse_dot_grad with garbage_expected after teardown error * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_gluon_kl{_v1,} * Temp skip of test_aggregate_duplication on gpu * Add seeding to test_{numpy,}_contrib_gluon_data_vision.py. Make created files unique. * Add ndarray module isolation to help debug test_bbox_augmenters worker crash * Marking test_sparse_square_sum serial after pytest worker crash * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_half_cauchy{_v1,} Co-authored-by: Serge Panev <[email protected]> Co-authored-by: Bart Gawrych <[email protected]> * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Remove pytest decorators introduced in error * Fix test_forward.py:test_consistency * Fix test_numpy_op.py tests * Improve test seeding in test_numpy_interoperablity.py (apache#18762) * Fix test_numpy_op.py:test_np_random_{beta,chisquare} * Reduce problem sizes with test_optimizer.py:test_multilamb * Skip test_gluon_gpu.py:test_fused_{lstm,gpu}_layer, fix test_rnn_cells, for fp16 contexts * Trigger CI Co-authored-by: Serge Panev <[email protected]> Co-authored-by: Bart Gawrych <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ce0a518 - Browse repository at this point
Copy the full SHA ce0a518View commit details
Commits on Sep 18, 2020
-
[v1.x][Submodule] Upgrade to oneDNN v1.6.3 (apache#19153) (apache#19161)
* upgrade to oneDNN v1.6 release branch * oneDNN v1.6 * fix cpp test * build oneDNN with c++11 * Revert "build oneDNN with c++11" This reverts commit 5365d83. * oneDNN v1.6.3 Co-authored-by: Tao Lv <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 49ba44a - Browse repository at this point
Copy the full SHA 49ba44aView commit details -
[v1.x] Backport Improve environment variable handling in unittests (a…
…pache#18424) (apache#19173) * Improve environment variable handling in unittests (apache#18424) * Add missing python functools import * Correct teardown import
Configuration menu - View commit details
-
Copy full SHA for 5079c35 - Browse repository at this point
Copy the full SHA 5079c35View commit details
Commits on Sep 19, 2020
-
[1.x][FEATURE] CUDA graphs support (apache#19142)
* Initial cherry-pick * Store NodeAttrs in OpExecutor * Do not allow stateful operations in CUDA graphs and provide mechanism for marking ops as safe * Guard against using ops with synchronization * Cleaning * Properly guard graphs * Limit graphs to CUDA 10.2+ * Fix the compilation when graphs are not available * Guarding the libcuda.so usage behind RTC compilation flag * Document the env variables * Add test * Fix the test * Use with_environment
Configuration menu - View commit details
-
Copy full SHA for 0fce381 - Browse repository at this point
Copy the full SHA 0fce381View commit details -
Revert "Fix memory leaks in Gluon (apache#18328) (apache#18359)" (apa…
…che#19181) This reverts commit b523527.
3Configuration menu - View commit details
-
Copy full SHA for 0496690 - Browse repository at this point
Copy the full SHA 0496690View commit details -
[1.x] Enable CUDA Graphs for TRT (apache#19184)
Signed-off-by: Serge Panev <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a35d568 - Browse repository at this point
Copy the full SHA a35d568View commit details
Commits on Sep 22, 2020
-
Configuration menu - View commit details
-
Copy full SHA for fe7cf99 - Browse repository at this point
Copy the full SHA fe7cf99View commit details -
[v1.8.x] ElementWiseSum fix for oneDNN (apache#18777) (apache#19200)
* Fix ElementwiseSum for DNNL * Fix sanity and replace push_back with emplace_back * Change order of the data format conditions * Add NOLINT to avoid readability error * Add test for oneDNN ElemwiseSum Co-authored-by: Bart Gawrych <[email protected]> Co-authored-by: Bart Gawrych <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 96f6454 - Browse repository at this point
Copy the full SHA 96f6454View commit details -
Configuration menu - View commit details
-
Copy full SHA for c2df97f - Browse repository at this point
Copy the full SHA c2df97fView commit details
Commits on Sep 23, 2020
-
[WEBSITE] v1.8 website patch (apache#19212)
* Add missing license header for md files (apache#18541) (apache#19189) Co-authored-by: ciyong <[email protected]> * Fixed Install page history broken (apache#18182) * fix install option block history broke * when history goes back, avoid button default css blue outline * use appropriate parameter name * format scss change * Update website version select drop down (apache#18188) * update version select drop down * align caret * revert scrollable content, add delayed hover effect * bugfix * fix new design doesn't work on mobile # Conflicts: # docs/static_site/src/_includes/get_started/get_started.html * Update website version select drop down (apache#18188) * update version select drop down * align caret * revert scrollable content, add delayed hover effect * bugfix * fix new design doesn't work on mobile # Conflicts: # docs/static_site/src/_includes/get_started/get_started.html * Fix gluon link missing (apache#18243) * fix gluon link missing * empty commit to trigger checks * empty commit to trigger checks * fix when clicking version dropdown it jumps to top of the page (apache#18238) * Website global search feature (apache#18288) * init global search ui * add hover effect to icon and refactor js * add search bar ui styles * fix search UI's effect on navbar height * add fade in/out effect to search ui and navbar * update search trigger to click and add x button for close * add version select for search * fix version typo * update dropdown * fix hitsperpage reset after change version * fix nav trigger not show * update search border css class name * make dropdown style consistent * global search mobile&tablet UI * adjust mobile search result width * extract global search related styles to a seperate scss * restore formatting to existing code * format & coding style * fix caret height bug * add mobile compatible UI * add license header to js files and update dropdown width * put docsearch css before main to overrides * update search result panel height * dynamically generate version dropdown * use more accurate selector over search result * use vh for height * add comments to scss * move versions to Jekyll global variable * remove redundant version key * make global search default version the same as website version Co-authored-by: Yang Shi <[email protected]> * replace google CDN with JQuery's own CDN (apache#18369) Co-authored-by: Yang Shi <[email protected]> * Add Developer Guide Docs to MXNet Website (apache#18474) * init dev guide * move dev guide above FAQ * update format and images * hoist git docs and fix styles * use relative urls * remove useless code block * use consistent url and file name * update heading * add apache license header * init dev guide * move dev guide above FAQ * update format and images * hoist git docs and fix styles * use relative urls * remove useless code block * use consistent url and file name * update heading * add apache license header * update doc - git clone recursive * reviewing the dev guide - proof reading and text edits Co-authored-by: Yang Shi <[email protected]> Co-authored-by: Talia Chopra <[email protected]> * fix contribute page anchor position shifted (apache#18571) Co-authored-by: Yang Shi <[email protected]> * Clipboard refactor (apache#18605) * refactor clipboard * make lang getter more extensible * trigger ci * User Feedback Widget (apache#18639) * user feedback widget implementation * add user feedback widget to python docs site * update margin * add apache license * one more license * turn off feedback widget on python site * update copy * format * add event value field * turn on widget on Python site # Conflicts: # docs/static_site/src/_includes/head.html # docs/static_site/src/assets/main.scss * Fix python micro-site table of content bugs (apache#18664) * update footer style * add compiled css of footer styles changes * add same style for footer2 * more fix to the toc * Fix all anchor shifts on website (apache#18674) * use regex that is supported by all browsers (apache#18811) * 1.7 compatible fix * add jquery fix * Consolidate installation instructions on website and add disclaimer for non-ASF ressources (apache#18487) * Update website with disclaimer for non-ASF ressources * Integrate Windows instructions to build_from_source.md * Remove master version from selector * Update Download links * Update get_started/download.md per Release Download Page policy # Conflicts: # contrib/clojure-package/README.md # docs/python_docs/python/tutorials/deploy/inference/image_classification_jetson.md # docs/static_site/src/_includes/get_started/get_started.html # docs/static_site/src/_includes/get_started/linux/clojure/gpu.md # docs/static_site/src/_includes/get_started/linux/java/gpu.md # docs/static_site/src/_includes/get_started/linux/julia/build-from-source.md # docs/static_site/src/_includes/get_started/linux/perl/perl.md # docs/static_site/src/_includes/get_started/linux/python/cpu/build-from-source.md # docs/static_site/src/_includes/get_started/linux/python/cpu/docker.md # docs/static_site/src/_includes/get_started/linux/python/cpu/pip.md # docs/static_site/src/_includes/get_started/linux/python/gpu/build-from-source.md # docs/static_site/src/_includes/get_started/linux/python/gpu/docker.md # docs/static_site/src/_includes/get_started/linux/python/gpu/pip.md # docs/static_site/src/_includes/get_started/linux/r/gpu.md # docs/static_site/src/_includes/get_started/linux/scala/cpu.md # docs/static_site/src/_includes/get_started/linux/scala/gpu.md # docs/static_site/src/_includes/get_started/macos # docs/static_site/src/_includes/get_started/macos/clojure/cpu.md # docs/static_site/src/_includes/get_started/macos/julia/build-from-source.md # docs/static_site/src/_includes/get_started/macos/perl/perl.md # docs/static_site/src/_includes/get_started/macos/python/cpu/build-from-source.md # docs/static_site/src/_includes/get_started/macos/python/cpu/docker.md # docs/static_site/src/_includes/get_started/macos/python/cpu/pip.md # docs/static_site/src/_includes/get_started/macos/python/gpu/build-from-source.md # docs/static_site/src/_includes/get_started/macos/python/gpu/pip_docker.md # docs/static_site/src/_includes/get_started/macos/r/cpu.md # docs/static_site/src/_includes/get_started/macos/scala/cpu.md # docs/static_site/src/_includes/get_started/windows # docs/static_site/src/_includes/get_started/windows/perl/perl.md # docs/static_site/src/_includes/get_started/windows/python/cpu/build-from-source.md # docs/static_site/src/_includes/get_started/windows/python/cpu/docker.md # docs/static_site/src/_includes/get_started/windows/python/cpu/pip.md # docs/static_site/src/_includes/get_started/windows/python/gpu/pip.md # docs/static_site/src/_includes/get_started/windows/r/cpu.md # docs/static_site/src/_includes/get_started/windows/r/gpu.md # docs/static_site/src/pages/get_started/build_from_source.md # docs/static_site/src/pages/get_started/download.md # docs/static_site/src/pages/get_started/osx_setup.md # docs/static_site/src/pages/get_started/ubuntu_setup.md # docs/static_site/src/pages/get_started/windows_setup.md * fix broken installation widget - remove empty entries (apache#18661) * update static files # Conflicts: # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css.map # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js.map * update header dropdown default version * fix failed pipeline * cherry pick 1.7 content from master * update version number in image classification tutorial * minor version fix * fix bullet point format bug * Fixed python website double scroller and improve UX (apache#18845) * make python site header scroll aware and avoid double scroller * add compiled assets * adjust python site second header height * add new line * set focus to main content on DOM load # Conflicts: # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css.map # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js.map # docs/python_docs/themes/mx-theme/src/scss/_root.scss * add jekyll base url to enable relative path * fix python micro site header link path * update python site css Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: ciyong <[email protected]> Co-authored-by: Yang Shi <[email protected]> Co-authored-by: Talia Chopra <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 975aa6e - Browse repository at this point
Copy the full SHA 975aa6eView commit details
Commits on Sep 24, 2020
-
[v1.x] Nightly Large Tensor test cherrypicks (apache#19194) (apache#1…
…9215) * fixing batch_norm and layer_norm for large tensors (apache#17805) Co-authored-by: Rohit Kumar Srivastava <[email protected]> * Fix nightly large_vector test caused by incorrect with_seed path (apache#18178) * add back the missing environment function Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7c9046a - Browse repository at this point
Copy the full SHA 7c9046aView commit details
Commits on Sep 26, 2020
-
delete executor before reallocating it memory (apache#19222)
Co-authored-by: Rohit Kumar Srivastava <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 07d7e13 - Browse repository at this point
Copy the full SHA 07d7e13View commit details -
added key for samskalicky (apache#19225)
Co-authored-by: Ubuntu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 51cc0af - Browse repository at this point
Copy the full SHA 51cc0afView commit details
Commits on Sep 30, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 7961555 - Browse repository at this point
Copy the full SHA 7961555View commit details