[Do not merge] This PR is for check the fix in onDNN. #19259

* bump up 1.x branch to 1.7.0 * bump version for clojure

* Using unrestricted * Drop publish step * Enable restricted nodes * Reverted website_full, added website_nightly * Reduced node labels to utility and linux_cpu

…che#18044) Backport of apache#18018

…18038) * Support projection feature for LSTM on CPU (Only Inference) (apache#17702) * Support projection feature for LSTM on CPU * test solution for -Werror=maybe-uninitialized * Check device type when create state * Document the projection feature of LSTM for RNN operator * Minor fix * Re-run CI * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 (apache#17872) * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 * Use nd.copy() to initialize parameters of new operator * Add check for output states * Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size * Split fused rnn layer test into tests of individual mode * Skip lstm and gru tests on CPU context without DNNL

* optimize for backward batchnorm * using memcpy instead of 'for' loop * rm unnecessary pointer cast and add const for some variable * trigger CI

… (apache#18033) * Update 3rdparty/mkldnn remote URL and pin to v1.3 (apache#17972) * update onednn remote url * checkout onednn v1.3 release * fix format test * make test Conflicts: .gitmodules 3rdparty/mkldnn tests/cpp/operator/mkldnn_test.cc * build flag * upgrade cmake

* Temporal solution for fp16 accumulation in Bert gemms * Resolve alpha/beta type issue * add documentation for env variable MXNET_FC_TRUE_FP16 * Improve description of env variable * Add unitest checking environment variable * keep pseudo-fp16 if architecture does not support Float16Compute * Fix cpplint

…e#17762) apache#18063 (apache#18069) * Dynamic subgraph compile support (apache#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (apache#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (apache#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project apache#15921 and apache#17270 Co-authored-by: guanxinq <[email protected]> Co-authored-by: Ziyi Mu <[email protected]>

…oling backward" (apache#18067) * [MKLDNN] support using any format in pooling backward (apache#17900) * use any format in pooling backward * use data_type() * fix backport

…iction (apache#17994) (apache#18085) (cherry picked from commit afae030)

…duce overhead (apache#17707) (apache#18039)

* Add LeakyReLU:Gelu (fwd and bwd) to fused ops * Add test LeakyReLU:gelu * cpplint * fix lint * fix bug SQRT_2 using constant memory * add comments

) * Fix ElemwiseSum for more than 4 inputs (apache#17995) * Fix ElemwiseSum for more than 4 inputs * Added test * Fix for handling negative indices in the fusion of slice (apache#17937) * Fix for handling of negative axis, begin and end in fusion of slice ops * Added test

…18064) * [MKLDNN] apply MKLDNNRun to quantized_act/transpose (apache#17689) * apply MKLDNNRun to quantized_act/transpose ops * run CI * [MKL-DNN] Integrate Conv3d and Pool3d/1d (apache#17884) * Integrate MKl-DNN conv3d and pool3d/1d * fix UT & address comments * clean code * rebase against latest master * fix conflicts * fix CI * rebase

apache#18095) * Vectorized loads for binary elemwise kernel * More generalization * Add backwardusenone * Remove the unused _backward_add op * Add vectorized backwardusein * Extending vectorization to more binary ops, binary ops with scalar and unary ops * Handling ElementwiseSum * Get rid of half2 in mshadow * Remove backward_elemwiseaddex * Revert "Remove the unused _backward_add op" This reverts commit f86da86. * Revert "Remove backward_elemwiseaddex" This reverts commit 7729114. * Add back the backward_add since C++ test relies on it * Test bcast implementations * First version of vecotrized bcast * Adding single side vectorized bcast kernel * Removing debug prints * Actually run the single side kernel * Move the default implementation of bcast to the vectorized one * Limit the new implementation to GPU only * Enabling vectorization when broadcast does not actually do broadcast * Cleaning * Cleaning part 2 * Fix for numpy ops using stuff from broadcast * Fix * Fix lint * Try to debug pinv numpy test * Fix * Fix the vectorized broadcast implementation for misaligned input pointers * Added tests * Added docs to cuda_vectorization.cuh * Another fix for broadcast and fix INT64 compilation * Optimize for aligned=true * 1 more addition to test * Reverting the change to Numpy op test * Trying mcmodel=medium to fix the failure in CMake static build * Revert "Trying mcmodel=medium to fix the failure in CMake static build" This reverts commit 1af684c. * Limiting the PR to just elementwise ops

* add debug prints to debug error in CI * add debug prints to debug error in CI * remove prints * initial commit * enabled calling create for selector * connected selector to call external class * added code to remove temp graph attrs * fixed build issues * changed shape inference to use different attr names * fixed selector class * cleaned up APIs * fixed sanity * updated build for extensions * sanity fix * refactored MXLoadLib into separate functions * undo rebase * finished merge * enabled verbose in library loading * fixed example * added passing args/aux down to graph pass * added creating new args/aux for graph passes * fixed return args/aux * fixed sanity * whitespace * fixed lint * updated perl API, README, added pass_lib to cmake build flow * fixed mistake with relu example lib * fixed perl syntax * addressed comments * addressed more comments * fixed compile issues Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]>

… locale that uses commas as the decimal separator (apache#17177) * Add a test for floating point parsing locale invariance * Use locale-invariant dmlc:stod/stof instead of std:stod/stof * Change the new operator tutorial to use dmlc:stod instead of std::stod * Rename locale invariance test * Skip test_scalarop_locale_invariance if the locales aren't available * Fix linter errors due to incorrect include order

* Add Apache License for mshadow * update cpp-package license * udpate license for mx-theme in top-level LICENSE * Enable RAT License check for mshadow, and keep the rest of 3rdparty unchanged. * add license header

…ache#18141) * For mxnet-validation pipeline, require sanity build to complete successfully before running other build pipelines. (apache#17999) * Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds. * Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future. Co-authored-by: Joe Evans <[email protected]> * If sanity build is not found, wait until Jenkins recognizes it. (apache#18119) * If sanity build is not found, wait until Jenkins recognizes it. * Also add a timeout of 30m for sanity build to run and complete, so we don't get stuck in a loop. Co-authored-by: Joe Evans <[email protected]> Co-authored-by: Joe Evans <[email protected]>

…pache#17772) (apache#18075) Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]>

* Fix pylint astroid sanity issue Cherrypick apache@18e2014 from apache#18220

* Update edge toolchain * Support platforms without rand_r * Fix the URL to the IUS repository * compiler warnings * Use a pre-c++17 way of distinguishing between device types * Greatly simplify qemu setup * Request the C++ standard library and extensions * Upgrade dmlc-core to resolve build errors * Remove leftovers from C++17 dev type check * Fix CPU-only RRNOp Forward * Change the ARM8 build to work like the ARM7 build * Revert "Fix CPU-only RRNOp Forward" This reverts commit 0a921a4. * Hack around the lack of constexpr if * Adjust the list of files to be packed in ARM jobs Co-authored-by: Leonard Lausen <[email protected]>

Fix leak of ndarray objects in the frontend due to reference cycle. Backport of 3e676fc

* fixed overwrite of args/aux variables * fixed spacing * Merged apache#18177 * updated python RPM URL

…apache#18309) * Revert "Fix and optimize handling of vectorized memory accesses (apache#17767)" This reverts commit 5542d03. * add license to reverted file

) (apache#18432) * remove OS from s3 library path * fix bash script run commands * Revert "remove OS from s3 library path" This reverts commit 2665113. * hardcode s3 path for upload/download of binaries Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>

* add bnrelu bf16 into amp list * [MKL-DNN] BatchNormRelu Fusion (apache#17679) * support bnrelu fusion * add param to gluon * move to contrib * fix gluon interface * reuse bn param * fix * fix forward * cache flags * fix lint * fix ut * trigger * trigger * trigger * inherit from BN base * fix lint Co-authored-by: no <[email protected]> * Remove chinese period which leads to utf-8 encoding problem (apache#18223) * add bnrelu amp test * trigger ci Co-authored-by: no <[email protected]> Co-authored-by: damNull <[email protected]>

…upport (apache#18320) (apache#18469) * Improve log_softmax performance by OneDNN library * Adapt tests for MKLDNN log_softmax * Fix lint errors * Fix indent and comments

Update basic_layers.py fix fix Update basic_layers.py fix bug Co-authored-by: Xingjian Shi <[email protected]>

…17632) (apache#18317) * [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <[email protected]>

…pache#18503)

…eq is `add` (apache#18518) * Fix batch norm when grad_req is * fix * remove softmax test * fix

* Fix CD (apache#17776) * Fix cd/mxnet_lib/dynamic/Jenkins_pipeline.groovy Fixes a regression in CD introduced by apache#17645 * Fix whitespace * Add NATIVE_ADDITIONAL.md Fixes a regression in CD introduced by apache#16899 * Update other $TYPE_ADDITIONAL.MD * Fix cd/python/docker Fixes regression introduced by apache#15990 * [CD] update pypi description, setup.py (apache#17681) * update pypi description, setup.py, use manylinux2010, use unified dist link for nightly * Use manylinux2014 Co-authored-by: Leonard Lausen <[email protected]> * reverting .so path as per MAKE flow Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>

* add config Makefile for jetson * modify jetson install guide

* Increase staggered build timeout to 180 min, since sanity build has 180 min timeout. * Decrease timeout so everyone is happy. Co-authored-by: Joe Evans <[email protected]>

…apache#18634) Co-authored-by: Leonard Lausen <[email protected]>

…) (apache#18654) * Add test for BatchNorm running variables synchronization * Fix BatchNorm backward synchronization It fixes issue apache#18610

…le input shapes (apache#18705)

* Migrate from private to public jetson toolchain files (apache#18677) * Set CMAKE_CUDA_COMPILER in aarch64-linux-gnu-toolchain.cmake (apache#18713) CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826 Co-authored-by: Leonard Lausen <[email protected]>

…18768) (apache#18778)

…#18752) * Fix linalg_potri operator for large tensor. * Update other variables to support large tensors. * Add to contributors. * Fix whitespace. * Update ZeroTriangular to support large tensors. * Add large tensor unit tests for linalg_potrf and linalg_potri. * Fix crash when accessing already destructed static variables (apache#18768) (apache#18778) Co-authored-by: Joe Evans <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]>

* add large tensor test for syrk, foward and backward * change to batch input * move syrk test into test-linalg Co-authored-by: Ubuntu <[email protected]>

apache#18744) * add linalg large matrix tests * add batch inputs linalg tests * reducing bsize to 1 to save time * move matrix generator to utils * passing mat size as arg * import util fn * fix sanity * add mx * call backward * merge fn * update grad value * refactor tests * add mx * add shape check Co-authored-by: Ubuntu <[email protected]>

* added forward, backward test for gemm2 * add backward check * correct gradient assert * move test inside linalg_ops * add shape checks

* Improving performance of broadcast_axis on GPU (apache#18168) * adding separate int32_t kernel for GPU in broadcast_axis/to/like operators * using structure instead of temp workspace to pass stride and shape * replacing hardcoded int32_t with generic index_t * combining CPU and GPU kernels to leverage cached stride calculation and fast access shape data in both Co-authored-by: Rohit Kumar Srivastava <[email protected]> * Improve performance of broadcast_axis on CPU (apache#17882) * adding comments explaining code optimizations * fixing broadcast_axis kernel to int32 * fixing slice_axis kernel to int32 * combining CPU and GPU implementation method signatures and cleaned up code * adding new broadcast_axis to np_matmul Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]>

* add shape check * add name to contributor.md Co-authored-by: Ubuntu <[email protected]>

…syevd (apache#18807) Co-authored-by: Rohit Kumar Srivastava <[email protected]>

…ache#18825) * add test for linalg.gemm * fix indents

* initial * test * gemm and gemm2 * type fix * syrk trmm trsm * gelqf * move tests from test_large_array.py to test_large_vector.py * fix white space issue Co-authored-by: Ubuntu <[email protected]>

… unit tests. (apache#18803) Co-authored-by: Joe Evans <[email protected]>

…he#18602) (apache#18708) * [v1.x] Backport of fix npx.softmax for 0-sized inputs (apache#18158) Co-authored-by: Hao Jin <[email protected]> * Fix softmax, logsoftmax failed on empty ndarray (apache#18602) * Fix failing empty array (log_)softmax * Modify test for npx (log_)softmax * Fix softmax, logsoftmax backward failed on empty ndarray (apache#18710) Co-authored-by: Yiyan66 <[email protected]> Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Bart Gawrych <[email protected]>

…che#18846) * fixing spatial export for batchnorm * retrigger CI * fixing broken pylint * retrigger build * deprecating spatial attribute in exporter so default behavior of spatial=1 is conveyed Co-authored-by: Vinitra Swamy <[email protected]>

* Cherry-pick apache#18310 apache#18355 (apache#18608) * cherry-pick: Fix missing MKLDNN headers (apache#18310) * Include all mkldnn headers in CD builds (apache#18355) * Fix cmake mkldnn install target. Previously mkldnn headers are installed to CMAKE_INSTALL_INCLUDEDIR instead of CMAKE_INSTALL_INCLUDEDIR/mkldnn * Fix pypi_package.sh pip/setup.py for mkldnn builds * Set CMAKE_CUDA_COMPILER in aarch64-linux-gnu-toolchain.cmake (apache#18713) CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826 Co-authored-by: Leonard Lausen <[email protected]> * remove linux-gputoolchain Co-authored-by: MoisesHer <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>

apache#18867)

* Update mirror for getting binutils source. * Remove erroneous wget command and duplicate mkdir command. Co-authored-by: Joe Evans <[email protected]>

…o v1.7.x (apache#18676) (apache#18890) * [Improvement] Invoke mkldnn and cudnn BatchNorm when axis != 1 (apache#18504) * fix batch norm when fix_gamma is True * support gradient accumulation for batch norm * mkldnn batchnorm support grad add * unittest for bn * fix bn arg * fix lint * fix mkldnn * fix mkldnn bn * fix grad when fixing gamma * fix naive gpu bn * fix lint * invoke mkldnn and cudnn batchnorm when axis != 1 * backport 18500 * change condition * fix * fix * add mkldnn_off for bn * remove mkldnn_off * recover save_000800.json * cast * remove and fix flaky test Co-authored-by: JackieWu <[email protected]> Co-authored-by: JackieWu <[email protected]>

…apache#18895) Signed-off-by: Serge Panev <[email protected]>

* Remove mention of nightly in pypi (apache#18635) * update bert dev.tsv link Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: Carin Meier <[email protected]> Co-authored-by: Sheng Zha <[email protected]>

Co-authored-by: Tao Lv <[email protected]>

…he#18785) * Update unix gpu toolchain (apache#18186) * update nvidiadocker command & remove cuda compat * replace cu101 with cuda since compat is no longer to be used * skip flaky tests * get rid of ubuntu_build_cuda and point ubuntu_cu101 to base gpu instead of cuda compat * Revert "skip flaky tests" This reverts commit 1c720fa. * revert removal of ubuntu_build_cuda * add linux gpu g4 node to all steps using g3 in unix-gpu pipeline * remove docker compose files * add back the caffe test since caffe is deprecated for mx2.0 and not 1.x * drop nvidia-docker requirement since docker19.0 supports it by default :q * remove compat from dockerfile * Cherry-pick apache#18635 to v1.7.x (apache#18935) * Remove mention of nightly in pypi (apache#18635) * update bert dev.tsv link Co-authored-by: Sheng Zha <[email protected]> * disable tvm in CI functions that rely on libcuda compat * tvm off for ubuntu_gpu_cmake build * drop tvm from all unix-gpu builds Co-authored-by: Carin Meier <[email protected]> Co-authored-by: Sheng Zha <[email protected]>

* initial commit * Support extra inputs for subgraph ops (apache#18779) Support additional inputs to custom subgraph ops that are not direct dependencies to ops in the subgraph. This will enable various use cases: custom control flow ops, custom ops that maintain a state that should be saved/loaded, etc. Highlights: * Added test that uses a graph pass (addInputPass) to add a new custom input to the subgraph op * Added new optional argument (clear) to hybridize & optimize_for APIs in Gluon Block to enable multiple optimizations * refactored lib_api.h JSON utilities * added new Graph data structure utilities to simplify custom graph passes * refactored custom op registration * enhanced custom subgraph op to support additional inputs to subgraph op that is not an input to ops in the subgraph * updated subgraph & graph pass READMEs * Added error messaging from external library * changed messages * changed to pointers and types * added cast * updated cast * fixed signed int * whitespace * fixd pass resource Co-authored-by: Ubuntu <[email protected]>

…t) (apache#18916) * [1.x] Backporting TensorRT and Gluon changes Signed-off-by: Serge Panev <[email protected]> * Remove test from Jenkins Signed-off-by: Serge Panev <[email protected]> * Fix test Signed-off-by: Serge Panev <[email protected]>

…ache#18929 (apache#18964) Signed-off-by: Serge Panev <[email protected]>

) (apache#18973) Co-authored-by: Vladimir Cherepanov <[email protected]> Co-authored-by: Vladimir Cherepanov <[email protected]>

… (apache#19009) * Fix LeakyRelu behaviour on empty input * Remove duplicated declarations

* Stop packaging GPL libquadmath.so (apache#19053) libquadmath.so is GPL and must not be distributed by Apache projects. Users will need to ensure that libquadmath.so is present on their systems if they use binary builds of MXNet. libquadmath.so has not yet undergone any ABI changes, thus all versions of libquadmath.so are ABI compatible and user just needs to install system version of libquadmath.so. libgfortran.so can be packaged thanks to GCC Runtime Library Exception. See https://www.apache.org/legal/resolved.html#category-x * Remove unmaintained pip packages * Workaround pypa/setuptools#2352

Backported apache#18930 * Support for fp16 in SpGeMM * adding test for GPU spmm Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]>

* initial commit * fixed c++17 downgrade * fixed stringstream * fixed cast * changed to use pointers for stringstream since not copyable * fixed includes * fixed makefile includes * skipped lint for malloc/free for passing across C ABI Co-authored-by: Ubuntu <[email protected]>

This PR cherry-picks commit 5122d32 into the v1.x branch. This is to enable the export of models where dangling layers are optimized out during symbol export. For more information, see here and here.

* empty list cannot be cleared issue fixed. * Update multiproc_data.py Co-authored-by: Sheng Zha <[email protected]>

Signed-off-by: Serge Panev <[email protected]>

apache#19017) * fix pooling_convention warning when convert model to onnx (apache#18529) * fix pooling_convention warning * fix pooling_convention warning * fix lint Co-authored-by: JackieWu <[email protected]> * Prevent uninitialized variable error. * Initial work to get Dropout to work with onnx 1.7 * Remove trailing whitespace for pylint. * Fix tensor initialization for Dropout operator input. * Update Clip operator to support latest ONNX opset versions by moving min/max attributes to inputs. * Fix whitespace. * Add support for importing Dropout operator in ONNX opset version >= 12. * Add support for import ONNX opsets >= 11 to clip operator. * Add optional opset_version parameter that defaults to latest opset version supported by onnx. Pass this parameter to each graph layer when exporting. * Add optional parameter to create_model() that allows user to specify which onnx opset version they want to use when exporting, defaults to latest version supported by onnx. * Use opset_version argument to determine operator format. * Add a opset_version parameter to from_onnx() so at operator conversion time, we know what opset version to use. * For Clip and Dropout operators, use opset version from passed proto_obj, which reflects what opset version the onnx model uses. * Use same tolerances that are in master. * Change Pad operator to use inputs instead of attributes for newer opset versions. Check opset version instead of ONNX version for Pooling operator. * Add documentation opset_version parameter. * Add opset_version parameters to unit tests. * Add test script for testing inference with onnxruntime on CV models from gluon model zoo. * Add license and clean up imports. * Install onnxruntime in docker container for unit tests. * Add onnxruntime to test dependencies. * Install onnxruntime into CentOS docker image. * Disable testing squeezenet models for now. * Update onnx version. * Fix typo. * Use mx.image.imread instead of PIL module. * ONNX import: use Conv pad attribute for symmetrical padding (apache#18675) Signed-off-by: Serge Panev <[email protected]> * Install onnx in CentOS containers when installing python. * Update import and export of some ONNX ops to support newer opset versions - this gets all ONNX unit tests to pass with onnx 1.7. * Re-enable squeezenet model testings in onnxruntime. * Run the onnxruntime inference tests in the ONNX pipeline instead of normal unittests pipelines. * Add missed return value. * Refactor code based on review comment. * Since the onnx tests are only run on ubuntu_cpu images, we don't need to install onnx and onnxruntime in the CentOS containers. Co-authored-by: Liu, Hao <[email protected]> Co-authored-by: JackieWu <[email protected]> Co-authored-by: Joe Evans <[email protected]> Co-authored-by: Serge Panev <[email protected]>

…9122) * Wait for async_fun to complete in NaiveEngine::PushAsync This fixes a race condition in which NaiveEngine::PushAsync was checking if the the async_fun had completed by the end of NaiveEngine::PushAsync. If async_fun hadn't completed yet, NaiveEngine::PushAsync would set an internal error string and deallocate the callback, causing segfault in async_fun once it would attempt calling the callback. * Update naive_engine.cc

* initial commit * incremented version number Co-authored-by: Ubuntu <[email protected]>

…8799' (apache#18975) * Update CUB and only for CUDA < 11 apache#18799 and update Makefile Signed-off-by: Serge Panev <[email protected]> * Add preprocessor option to silence CUB C++14 warning Signed-off-by: Serge Panev <[email protected]>

… (apache#19112) * Fix for duplicate subgraph inputs/outputs (apache#16131) * fix for duplicate inputs * fixed error * fixed whitespace * Remove duplicate outputs from subgraphs * changed subgraph to create map of outputs * added static_cast * changed map<int,v> to vector * sanity fix * sanity2 * updated backends with new connectSubgraphOutputs API * fixed map creation logic * added updates for reattach function * creating node only if it is not an input to subgraph * creating object based on var_name only * updating ConnectSubgraphOutputs for mkldnn_elemwisemul_post_quantize_property.h * add debug prints to debug error in CI * remove prints * added prints to debug in the CI * revert changes * reverted changes * deduplicaated inputs to subgraph * deduplicated subgraph inputs * simplified inputs * cleaned up * deduplicate outputs * cleand up * added deduplication to subgraph node outputs * fixed prev compare * fixed issue with inputs and added test * fixd whitespace, removed prints Co-authored-by: Sam Skalicky <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]> * added flag to enable dedupe ondemand * fixed dedup logic * improved dedup logic * fixed sanity * propogated option * check option in custom subgraph prop * fixed options map * fixed missing * added dedup to subgraph_prop base class for testing * added test for dedup * added comments Co-authored-by: Sam Skalicky <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>

Signed-off-by: Serge Panev <[email protected]>

* cherry-pick intgemm from master, fix build * Fix test to conform to 1.x * Makefile supporting intgemm compilation * Stricter dependencies on git checkout of intgemm * Operators depend on mkldnn * Don't compile intgemm with gcc older than 5 * Fix intgemm test for windows on 1.x by not using pytest * Update intgemm to use template arguments for integer immediates * Try to fix clang3.6 * Ban gcc < 5 in cmake * Update intgemm with gcc 5.5 debug workaround

…pache#19123) (apache#19158) * [1.x] Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (apache#19123) * Trigger CI * Appending to existing CMAKE_CUDA_FLAGS in all cases

* pad grad modified * Fix pad grad error * modify pad constant backward * Fix test error * Fix test error * Fix kAddTo supported * Add test for grad_req='add' Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: Wentao Xu <[email protected]>

…pache#19149) * Add new docker containers for Cuda 11.0 and libcudnn8. * Add new functions for running GPU builds and tests in new Cuda11 containers. * Add runtime functions for cuda 11.0 related builds/tests. * Add new pipeline for testing cuda 11.0 builds. * Run cuda11 pipeline when sanity completes. * Use base image that already has libcudnn8 installed from Nvidia. Remove calls to nvidia/cudnn install scripts. * Don't build CPP package for cuda11 build. * Use proper base docker image for testing (include cudnn8) and don't manually install cudnn8. * Re-enable CPP package build. * Add env variable LD_LIBRARY_PATH in the build container so cpp-packagee build works. Remove unneeded components of docker containers to reduce size and build time. * Add sm_80 and compute_80 to compiled cuda architectures. * Add back binutils install since we are building for more cuda architectures and will hit the ar limit. Co-authored-by: Joe Evans <[email protected]>

@Retry

…. Also test seeding (apache#18762). (apache#19148) * Add sm arch 80 to Makefile * Unittest tolerance handling improvements (apache#18694) * Add sm arch 80 to Makefile * Add TF32 to cuBLAS GEMMs Signed-off-by: Serge Panev <[email protected]> * Add CUDA version guards Signed-off-by: Serge Panev <[email protected]> * Remove useless TF32 for double and old CUDA version Signed-off-by: Serge Panev <[email protected]> * Factorize VERSION_ADJUSTED_TF32_MATH Signed-off-by: Serge Panev <[email protected]> * Add TF32 considerations to test_util.py:check_consistency() * Bypass test_gluon_gpu.py:test_large_models if gmem >32GB * Default tols in assert_almost_equal() now a function of dtype and ctx * Expand types listed by default_tols() * Fix pylint * All with_seed() tests to waitall in teardown * Elevate MXNET_TEST_SEED logging to WARNING * Revert test_gluon_gpu.py:test_rnn_layer to default tols * Fix test_gluon_model_zoo_gpu.py::test_inference and test_operator_gpy.py::test_np_linalg_{solve,tensorinv} * test_numpy_interoperability.py to not fix seed for rest of CI * Further fix to test_np_linalg_tensorinv * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Fix test_operator_gpu.py::test_embedding_with_type * Fix test_operator_gpu.py::{test_*convolution_large_c,test_np_linalg_tensorsolve} * Remove unneeded print() from test_numpy_interoperability.py * Unify tol handling of check_consistency() and assert_almost_equal(). Test tweeks. * Add tol handling of assert_almost_equal() with number args * Add tol handling of bool comparisons * Fix test_numpy_op.py::test_np_random_rayleigh * Fix test_operator_gpu.py::test_batchnorm_with_type * Fix test_gluon.py::test_sync_batchnorm in cpu selftest * Improve unittest failure reporting * Add to robustness of test_operator_gpu.py::test_embedding_with_type * Check_consistency() to use equal backward gradients for increased test robustness * Fix test_operator_gpu.py::test_{fully_connected,gemm}. Add default_numeric_eps(). * test_utils.py fix for numeric gradient calc * Reinstate rtol=1e-2 for test_operator.py::test_order * Remove auto-cast of check_consistency() input data to least precise dtype (not needed) * Fix test_operator.py::test_{reciprocol,cbrt,rcbrt}_op * Expand default float64 numeric_eps for test_operator_gpu.py::test_sofmin * Fix segfault-on-error of @Retry decorator. Add test isolation. * assert_almost_equal() to handle a,b scalars * Fix test_operator_gpu.py::test_gluon_{mvn,mvn_v1} race * Fix test_operator_gpu.py::test_flatten_slice_after_conv via scale * Remove test_utils.py:almost_equal_ignore_nan() * Fix sample vs. pop variance issue with test_numpy_op.py::test_npx_batch_norm * Expose test_utils.py:effective_dtype() and use to fix test_operator_gpu.py::test_np_linalg_svd * Fix true_divide int_array / int_scalar -> float_array to honor np_default_dtype * Try test_elemwise_binary_ops serial to avoid pytest worker crash * Fix (log_)softmax backward on empty ndarray * Temporarily log all CI seeds to troubleshoot seed non-determinism * Revert "Temporarily log all CI seeds to troubleshoot seed non-determinism" This reverts commit f60eff2. * Temp log all CI seeds to troubleshoot unwanted seed determinism * Revert "Add sm arch 80 to Makefile" This reverts commit f9306ce. * Same fix of sample vs. pop variance issue, now with test_operator_gpu.py::test_batchnorm * Revert "Temp log all CI seeds to troubleshoot unwanted seed determinism" This reverts commit ff328ef. * Marking test_sparse_dot_grad with garbage_expected after teardown error * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_gluon_kl{_v1,} * Temp skip of test_aggregate_duplication on gpu * Add seeding to test_{numpy,}_contrib_gluon_data_vision.py. Make created files unique. * Add ndarray module isolation to help debug test_bbox_augmenters worker crash * Marking test_sparse_square_sum serial after pytest worker crash * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_half_cauchy{_v1,} Co-authored-by: Serge Panev <[email protected]> Co-authored-by: Bart Gawrych <[email protected]> * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Remove pytest decorators introduced in error * Fix test_forward.py:test_consistency * Fix test_numpy_op.py tests * Improve test seeding in test_numpy_interoperablity.py (apache#18762) * Fix test_numpy_op.py:test_np_random_{beta,chisquare} * Reduce problem sizes with test_optimizer.py:test_multilamb * Skip test_gluon_gpu.py:test_fused_{lstm,gpu}_layer, fix test_rnn_cells, for fp16 contexts * Trigger CI Co-authored-by: Serge Panev <[email protected]> Co-authored-by: Bart Gawrych <[email protected]>

* upgrade to oneDNN v1.6 release branch * oneDNN v1.6 * fix cpp test * build oneDNN with c++11 * Revert "build oneDNN with c++11" This reverts commit 5365d83. * oneDNN v1.6.3 Co-authored-by: Tao Lv <[email protected]>

…pache#18424) (apache#19173) * Improve environment variable handling in unittests (apache#18424) * Add missing python functools import * Correct teardown import

* Initial cherry-pick * Store NodeAttrs in OpExecutor * Do not allow stateful operations in CUDA graphs and provide mechanism for marking ops as safe * Guard against using ops with synchronization * Cleaning * Properly guard graphs * Limit graphs to CUDA 10.2+ * Fix the compilation when graphs are not available * Guarding the libcuda.so usage behind RTC compilation flag * Document the env variables * Add test * Fix the test * Use with_environment

…che#19181) This reverts commit b523527.

Signed-off-by: Serge Panev <[email protected]>

* Fix ElementwiseSum for DNNL * Fix sanity and replace push_back with emplace_back * Change order of the data format conditions * Add NOLINT to avoid readability error * Add test for oneDNN ElemwiseSum Co-authored-by: Bart Gawrych <[email protected]> Co-authored-by: Bart Gawrych <[email protected]>

)

* Add missing license header for md files (apache#18541) (apache#19189) Co-authored-by: ciyong <[email protected]> * Fixed Install page history broken (apache#18182) * fix install option block history broke * when history goes back, avoid button default css blue outline * use appropriate parameter name * format scss change * Update website version select drop down (apache#18188) * update version select drop down * align caret * revert scrollable content, add delayed hover effect * bugfix * fix new design doesn't work on mobile # Conflicts: # docs/static_site/src/_includes/get_started/get_started.html * Update website version select drop down (apache#18188) * update version select drop down * align caret * revert scrollable content, add delayed hover effect * bugfix * fix new design doesn't work on mobile # Conflicts: # docs/static_site/src/_includes/get_started/get_started.html * Fix gluon link missing (apache#18243) * fix gluon link missing * empty commit to trigger checks * empty commit to trigger checks * fix when clicking version dropdown it jumps to top of the page (apache#18238) * Website global search feature (apache#18288) * init global search ui * add hover effect to icon and refactor js * add search bar ui styles * fix search UI's effect on navbar height * add fade in/out effect to search ui and navbar * update search trigger to click and add x button for close * add version select for search * fix version typo * update dropdown * fix hitsperpage reset after change version * fix nav trigger not show * update search border css class name * make dropdown style consistent * global search mobile&tablet UI * adjust mobile search result width * extract global search related styles to a seperate scss * restore formatting to existing code * format & coding style * fix caret height bug * add mobile compatible UI * add license header to js files and update dropdown width * put docsearch css before main to overrides * update search result panel height * dynamically generate version dropdown * use more accurate selector over search result * use vh for height * add comments to scss * move versions to Jekyll global variable * remove redundant version key * make global search default version the same as website version Co-authored-by: Yang Shi <[email protected]> * replace google CDN with JQuery's own CDN (apache#18369) Co-authored-by: Yang Shi <[email protected]> * Add Developer Guide Docs to MXNet Website (apache#18474) * init dev guide * move dev guide above FAQ * update format and images * hoist git docs and fix styles * use relative urls * remove useless code block * use consistent url and file name * update heading * add apache license header * init dev guide * move dev guide above FAQ * update format and images * hoist git docs and fix styles * use relative urls * remove useless code block * use consistent url and file name * update heading * add apache license header * update doc - git clone recursive * reviewing the dev guide - proof reading and text edits Co-authored-by: Yang Shi <[email protected]> Co-authored-by: Talia Chopra <[email protected]> * fix contribute page anchor position shifted (apache#18571) Co-authored-by: Yang Shi <[email protected]> * Clipboard refactor (apache#18605) * refactor clipboard * make lang getter more extensible * trigger ci * User Feedback Widget (apache#18639) * user feedback widget implementation * add user feedback widget to python docs site * update margin * add apache license * one more license * turn off feedback widget on python site * update copy * format * add event value field * turn on widget on Python site # Conflicts: # docs/static_site/src/_includes/head.html # docs/static_site/src/assets/main.scss * Fix python micro-site table of content bugs (apache#18664) * update footer style * add compiled css of footer styles changes * add same style for footer2 * more fix to the toc * Fix all anchor shifts on website (apache#18674) * use regex that is supported by all browsers (apache#18811) * 1.7 compatible fix * add jquery fix * Consolidate installation instructions on website and add disclaimer for non-ASF ressources (apache#18487) * Update website with disclaimer for non-ASF ressources * Integrate Windows instructions to build_from_source.md * Remove master version from selector * Update Download links * Update get_started/download.md per Release Download Page policy # Conflicts: # contrib/clojure-package/README.md # docs/python_docs/python/tutorials/deploy/inference/image_classification_jetson.md # docs/static_site/src/_includes/get_started/get_started.html # docs/static_site/src/_includes/get_started/linux/clojure/gpu.md # docs/static_site/src/_includes/get_started/linux/java/gpu.md # docs/static_site/src/_includes/get_started/linux/julia/build-from-source.md # docs/static_site/src/_includes/get_started/linux/perl/perl.md # docs/static_site/src/_includes/get_started/linux/python/cpu/build-from-source.md # docs/static_site/src/_includes/get_started/linux/python/cpu/docker.md # docs/static_site/src/_includes/get_started/linux/python/cpu/pip.md # docs/static_site/src/_includes/get_started/linux/python/gpu/build-from-source.md # docs/static_site/src/_includes/get_started/linux/python/gpu/docker.md # docs/static_site/src/_includes/get_started/linux/python/gpu/pip.md # docs/static_site/src/_includes/get_started/linux/r/gpu.md # docs/static_site/src/_includes/get_started/linux/scala/cpu.md # docs/static_site/src/_includes/get_started/linux/scala/gpu.md # docs/static_site/src/_includes/get_started/macos # docs/static_site/src/_includes/get_started/macos/clojure/cpu.md # docs/static_site/src/_includes/get_started/macos/julia/build-from-source.md # docs/static_site/src/_includes/get_started/macos/perl/perl.md # docs/static_site/src/_includes/get_started/macos/python/cpu/build-from-source.md # docs/static_site/src/_includes/get_started/macos/python/cpu/docker.md # docs/static_site/src/_includes/get_started/macos/python/cpu/pip.md # docs/static_site/src/_includes/get_started/macos/python/gpu/build-from-source.md # docs/static_site/src/_includes/get_started/macos/python/gpu/pip_docker.md # docs/static_site/src/_includes/get_started/macos/r/cpu.md # docs/static_site/src/_includes/get_started/macos/scala/cpu.md # docs/static_site/src/_includes/get_started/windows # docs/static_site/src/_includes/get_started/windows/perl/perl.md # docs/static_site/src/_includes/get_started/windows/python/cpu/build-from-source.md # docs/static_site/src/_includes/get_started/windows/python/cpu/docker.md # docs/static_site/src/_includes/get_started/windows/python/cpu/pip.md # docs/static_site/src/_includes/get_started/windows/python/gpu/pip.md # docs/static_site/src/_includes/get_started/windows/r/cpu.md # docs/static_site/src/_includes/get_started/windows/r/gpu.md # docs/static_site/src/pages/get_started/build_from_source.md # docs/static_site/src/pages/get_started/download.md # docs/static_site/src/pages/get_started/osx_setup.md # docs/static_site/src/pages/get_started/ubuntu_setup.md # docs/static_site/src/pages/get_started/windows_setup.md * fix broken installation widget - remove empty entries (apache#18661) * update static files # Conflicts: # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css.map # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js.map * update header dropdown default version * fix failed pipeline * cherry pick 1.7 content from master * update version number in image classification tutorial * minor version fix * fix bullet point format bug * Fixed python website double scroller and improve UX (apache#18845) * make python site header scroll aware and avoid double scroller * add compiled assets * adjust python site second header height * add new line * set focus to main content on DOM load # Conflicts: # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.css.map # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js # docs/python_docs/themes/mx-theme/mxtheme/static/sphinx_materialdesign_theme.js.map # docs/python_docs/themes/mx-theme/src/scss/_root.scss * add jekyll base url to enable relative path * fix python micro site header link path * update python site css Co-authored-by: Sheng Zha <[email protected]> Co-authored-by: ciyong <[email protected]> Co-authored-by: Yang Shi <[email protected]> Co-authored-by: Talia Chopra <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>

…9215) * fixing batch_norm and layer_norm for large tensors (apache#17805) Co-authored-by: Rohit Kumar Srivastava <[email protected]> * Fix nightly large_vector test caused by incorrect with_seed path (apache#18178) * add back the missing environment function Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]> Co-authored-by: Rohit Kumar Srivastava <[email protected]>

Co-authored-by: Rohit Kumar Srivastava <[email protected]>

Co-authored-by: Ubuntu <[email protected]>

Commits on Jun 18, 2020

Add KEY for Ciyong Chen (apache#18576 )

ciyongch committed Jun 18, 2020

Configuration menu

View commit details

Copy full SHA for f563fa4

Browse repository at this point

Copy the full SHA

f563fa4 View commit details

Browse the repository at this point in the history

Commits on Aug 8, 2020

Update windows_setup.md (apache#18874 )

hetong007 committed Aug 8, 2020

Configuration menu

View commit details

Copy full SHA for cc287a0

Browse repository at this point

Copy the full SHA

cc287a0 View commit details

Browse the repository at this point in the history

Commits on Aug 13, 2020

bump version to 1.8.0 (apache#18899 )

szha committed Aug 13, 2020

Configuration menu

View commit details

Copy full SHA for d2d6408

Browse repository at this point

Copy the full SHA

d2d6408 View commit details

Browse the repository at this point in the history

Commits on Sep 7, 2020

Update base_module.py (apache#19096 )

szha committed Sep 7, 2020

Configuration menu

View commit details

Copy full SHA for 8dbed96

Browse repository at this point

Copy the full SHA

8dbed96 View commit details

Browse the repository at this point in the history

Commits on Sep 30, 2020

proof1

pioy committed Sep 30, 2020

Configuration menu

View commit details

Copy full SHA for 7961555

Browse repository at this point

Copy the full SHA

7961555 View commit details

Browse the repository at this point in the history

[Do not merge] This PR is for check the fix in onDNN. #19259

[Do not merge] This PR is for check the fix in onDNN. #19259

Commits on Mar 3, 2020

Commits on Mar 6, 2020

Commits on Apr 4, 2020

Commits on Apr 10, 2020

Commits on Apr 14, 2020

Commits on Apr 15, 2020

Commits on Apr 16, 2020

Commits on Apr 17, 2020

Commits on Apr 18, 2020

Commits on Apr 21, 2020

Commits on Apr 22, 2020

Commits on Apr 24, 2020

Commits on Apr 26, 2020

Commits on May 11, 2020

Commits on May 26, 2020

Commits on May 27, 2020

Commits on May 28, 2020

Commits on May 29, 2020

Commits on Jun 2, 2020

Commits on Jun 3, 2020

Commits on Jun 8, 2020

Commits on Jun 9, 2020

Commits on Jun 15, 2020

Commits on Jun 16, 2020

Commits on Jun 18, 2020

Commits on Jul 1, 2020

Commits on Jul 2, 2020

Commits on Jul 9, 2020

Commits on Jul 15, 2020

Commits on Jul 23, 2020

Commits on Jul 24, 2020

Commits on Jul 27, 2020

Commits on Jul 28, 2020

Commits on Jul 29, 2020

Commits on Jul 30, 2020

Commits on Jul 31, 2020

Commits on Aug 3, 2020

Commits on Aug 4, 2020

Commits on Aug 8, 2020

Commits on Aug 10, 2020

Commits on Aug 11, 2020

Commits on Aug 12, 2020

Commits on Aug 13, 2020

Commits on Aug 14, 2020

Commits on Aug 17, 2020

Commits on Aug 18, 2020

Commits on Aug 19, 2020

Commits on Aug 20, 2020

Commits on Aug 24, 2020

Commits on Aug 26, 2020

Commits on Sep 1, 2020

Commits on Sep 3, 2020

Commits on Sep 7, 2020

Commits on Sep 8, 2020

Commits on Sep 9, 2020

Commits on Sep 10, 2020

Commits on Sep 11, 2020

Commits on Sep 13, 2020

Commits on Sep 14, 2020

Commits on Sep 15, 2020

Commits on Sep 16, 2020

Commits on Sep 17, 2020

Commits on Sep 18, 2020

Commits on Sep 19, 2020

Commits on Sep 22, 2020

Commits on Sep 23, 2020

Commits on Sep 24, 2020

Commits on Sep 26, 2020

Commits on Sep 30, 2020