Sync upstream #39

wweic · 2019-08-09T16:40:52Z

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

…e#3533) * [INFA][IR] Build and Evolve Low-level IR. Remove dep from HalideIR. * Update include/tvm/node/ir_functor.h Co-Authored-By: Jared Roesch <[email protected]> * Update include/tvm/node/ir_functor.h Co-Authored-By: Jared Roesch <[email protected]>

* [Relay][Quantization] Fix issue introduced in apache#3135 * Recover StopFusion * Fix fmultiref * Fix lint

* [ARITH][IR] Introduce FloorDiv/Mod * Address review comments * address review comments, fix div sub rule

…he#3526) * [TVM] Fix bound inference to avoid allocating too much * [ARITH][BOUND] Pass analyzer to PropBoundToInputs

* Enable set_input_zero_copy in GraphRuntime * Fix LoadParams * Fix * lint * Fix remote context issue * Fix * Remove LOG * Remove unused variables * Add tests * works * More test scenarios * make it simpler * Remove unnecessary changes * Address comments * More comments * Address comments * Fix build

* tmp * Port vm and object to python * clean up * update vm build module * update * x * tweak * cleanup * update * fix rebase * Rename to VMCompiler * fix

…apache#3514)

* Fix build error * comments

* [Relay][VM]Fix debug statement * Change debug statement

* [docs] Add a tutorial for the pass manager * address comment * address more comments * retrigger ci * address steven's comments * address comments * retrigger ci * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <[email protected]> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <[email protected]> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <[email protected]> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <[email protected]> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <[email protected]> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Logan Weber <[email protected]> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Logan Weber <[email protected]>

Apply suggestions from code review Co-Authored-By: Wei Chen <[email protected]>

Let's welcome Zhi as a new Apache TVM Committer!

…apache#3546) * Support additional architectures beyond x86_64 in ubuntu_install_java While attempting to get a development environment going for TVM on my AArch64 desktop I ran into some hardcoding of relevant architectures.

…e#3588)

* Improve boundary nodes in graph tuner * Limit output node number * Fix test * Improve warning. * Fix test

* do * fix test

…matrices (apache#3707) * add build gcn tutorial * add transpose operator for square sparse matrices * remove extra files * change loop tag * comply with lint * comply with lint -- line too long * comply with lint * lint check * lint check * lint check * apply marisa and theirry's reviews

…pache#3712) * fix * fix interpreter

* clean up tf frontend * fix get_relay_op

This includes changes to build TVM runtime for Hexagon.

* Fix the tile_rx and tile_ry issue. Note that this patch depends on pull request apache#9 in tvm-distro.

* [Relay] Rewrite pass. This pass transforms an expression to other expression. This pass has many usecases * Replace a expr to another expr, if the other expr has faster performance. * For ASICs, we might want to modify the inputs to adapt to the HW support. * Alter op layout can work in conjunction with this pass. The supporting usecase is the Intel i8 x i8 conv. Intel HW supports u8 x i8 conv in HW. Using this pass, we can replace an i8 x i8 conv to a sequence of operators where one of the operators is now u8 x i8 conv. This will also help automatic quantizaion performance. * Better API name. * Removing the conv2d legalization for x86. Will send a separate PR. * Test name changes. * Registering one funtion to register FTVMLegalize. * Better comments.

…t_dim (apache#3701) * Fix mxnet converter for hybrid block * tweak * fix rebase * fix * add test

* Add LayerNorm op * update * fix * Add mean_std and mean_variance * add std and update doc * add license * x * lint * x * fix * fix doc

… same (apache#3724)

* add build gcn tutorial * add dgl to docker file * add dgl to docker file * Apply suggestions from code review Co-Authored-By: 雾雨魔理沙 <[email protected]> * add dgl to docker file * rerun checks * Revert "add build gcn tutorial" This reverts commit dbe8b5f. * resolve git issue * resolve git issue * resolve git issue * apply marisa's comment

* [DOCKER] Fix missing apt https transport support * [DOCKER] Drop superflous explicit sudo's

* [Relay] [Quantization] WIP - Common files for the qauntization work. * [Relay] [Quantization] WIP - Prototyping requantize op. * Requantize operator implementation. Requantize converts one quantized tensor representation to another quantized representation. The PR has following implementation features - Requantize operator defined in qnn namespace - relay.qnn.requantize - Lowering of the requantize to exisiting Relay operators - Integer fixed point implementation of requantize - Two rounding modes - FE_UPWARDS (round towards infinity) and FE_AWAY_FROM_ZERO (std::round behavior) - Floating point implementation as well, that can act as reference or can be used for devices when FP32 computation is not used. - Unit test cases Relevant Issue - apache#2351 Credit to TFLite and GemmLowp to provide reference implementations. * Typo and lint fixes. * Doc fix. * Uncommenting the lint script (fixing mistake). * Modifying the unit tests. * Moving C++ files into src/relay/qnn * Moving python files to python/tvm/relay/qnn. Some minor fixes. * Moving the attrs.h inside the include directory. * Pushing files that I forgot earlier. Changing util location. * Incorporating comments. API change. Lint fixes. * Modifying the GetFixedPointMultiplierShift API as per comments. * Forgot the dialect change. * Changing rewrite to qnn_lower. * Renaming Quantize to Qnn for clarity. * Remove use_int_domain. * Incorportaing review comments. * Adding API doc for QNN dialect. * Move the qnn_lower pass to transform namespace. * Moving from expr to module. Adding namespace in C++. * Minor sentence rewrites. Added qnn namespace. * Added the API doc. * Chanding default out_dtype to int8. Adding a test with in/out_dtype as uint8. * Style fixes. Better error messages. * Adding documentation. * More documentation fixes. * Adding out dtype check for requantize. * Adding corner case for FP32 to fixed point conversion. * Adding extra line. * Documentation fix. * Adding static inline. * Incorporating jackwish comment. Removed idtype from requantize lowering. * Removing Quantize/Dequantize code. Restricting Requantize to (u)int8/int32. * Style fixes. * Fix the docs. * Move to Legalize API.

* fix * fixes

This reverts commit 7d96118.

…generating (apache#5962) * Code migration Start (neo-ai#1) * Init commit: Code migration Start * Add loop_state.cc/h * Add ComputeDAG basic test * Split transform_step out & Update more UTs (neo-ai#3) * Split transform_step out * Update GetProducers & GetConsumers * Update UTs * Add UT for CacheReadWrite & Some bug fix * Add search_task, measure and serialization (neo-ai#4) * Add FollowSplit & FollowFusedSplit tests * Update dag.InferBound & its UT * Add search_task, measure and serialization * Update Serialization UT * Add MetaTileRewritePolicy (neo-ai#5) * Add feature * Add cost_model, meta_tile_rewrite_policy * Add MetaTileRewritePolicy basic UT * Basic Python API for State (neo-ai#6) * Add Basic Python API for State * Add UTs for State * Add Python API: Measure & Task (neo-ai#7) * Update the return value of state operation * Add task * Copy measure.py & utils.py * Fix LocalBuilder * Fix LocalRunner * Add ansor.auto_schedule() API; First AutoSchedule working version(neo-ai#8) * Add basic Python support for ansor.auto_schedule * Update AutoSchedule API * Bug fix for get the attach point of a fused iter * Update UT after infer bug fix * Bug fix & Add python serialization API (neo-ai#10) * Delete C++ UT hack since Python is ready * Add ndarray.non_empty * Update Serialization python API * Improve code style, python wrapper and test cases (neo-ai#11) * Update c++ code style and unit test * Update python State wrapper and test cases * fix unit tests * Add RPCRunner & OpenCL/CUDA test (neo-ai#12) * Add RPCRunner & OpenCL search test * Add CUDA search test * Add RPCRunner test * rebase to upstream/master * Add Ansor basic tutorial (neo-ai#13) * Add basic tutorial * migrate feature extraction (neo-ai#14) * Add XGBModel & RPCRunnerWarpper (neo-ai#15) * Add XGBModel & RPCRunnerWarpper * Revert "Add Parallel Granularity Mutation" * Migrate workload_registry.py (neo-ai#16) * add workload registry * update * update * add task scheduler (neo-ai#17) * Add conv2d cuda tutorial with workload registry (neo-ai#18) * add tune_test.py (the old tune_wkl.py) (neo-ai#19) * add tune_test.py (the old tune_wkl.py) * update * fix measure * fix for gpu * Code refine for tune_test.py & Add a pre load callback (neo-ai#20) * Bug fix for tutorials * Add PreLoadMeasuredStates * Add search_callback support for task tuner * Code refine for tune_test.py * Update * Update * Update * Update * Bug fix * Add python custom sketch rule (neo-ai#21) * Add custom sketch rule * Bug fix * Ansor Relay Integration (without layout rewrite) (neo-ai#22) * relay integration * Add tune_op_subgraph.py & Some code clean for tune_network.py (neo-ai#23) * Add single op tune scripts * Add tune subgraph support * Merge all op & all subgraph to one file * Rename file * add explicit_unroll_max_extent (neo-ai#25) * Add Index simplification & API update (neo-ai#26) * Add vectorized cooperative_fetching test * Update math simplify for vectorized CF * File rename * Update tune_network * API update * Update PreLoadMeasuredStates & Some bug fix (neo-ai#27) * Add a threading wrapper to fix the test bug * Set default TVM_USE_AUTO_SCHEDULER to false * Update PreLoadMeasuredStates callback * Add tensorize step for loop_state (neo-ai#31) * Add tensorize step * State python api update (neo-ai#33) * Start to update api * Add compute_dag to state * API update * kernel layout rewrite (neo-ai#28) * kernel layout rewrite * remove some hacks * add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass * set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite * [cache flush] port cache flush to ansor (neo-ai#32) * Improve relay integration (neo-ai#34) * tmp checkpoint * Improve relay integration * Improve relay integration * Fix xgb error & Simplify dispatcher (neo-ai#35) * Rename "MetaTileRewritePolicy" to "SketchPolicy". (neo-ai#36) * Rename "MetaTileRewritePolicy" to "SketchPolicy". * Add a new class for auto_unroll_max_step, storage_offset in StageNode * fix tune_op_subgraph.py * rebase * Migrate all node::make to noderef's construct function (neo-ai#37) * Start to move xxxnode::make to noderef() * Update * Update * Finish transform_step * Finish comute dag & auto schedule * Update * Update * Update * Update * Update * Code refine * Code refine * Code refine * Update * Update * Some lint fix & Recover the double constructor of tvm::PrimExpr (neo-ai#39) * lint fix * clang-format-fix * pylint fix * Update * Recover the double constructor of tvm::PrimExpr * Fix pylint * pylint fix * pylint fix * Add MutateComputeLocation and MutateParallel in evolutionary search (neo-ai#40) * Add MutateComputeLocation and MutateParallel in evolutionary search * fix lint * Improve loop state python API (stage_tensors -> stage_ops) (neo-ai#41) * improve loop state python API (stage_tensors -> stage_ops) * fix * ComputeDAG bug fix & Add Custom TensorCore Matmul Example (neo-ai#42) * Bug Fix * Sample example of Custom TensorCore Matmul * Rever Commits, Start to build minimum Ansor system * Code clean for minimum Ansor system * Bug fix & Delete AccessAnalyzer * Delete attachmap & Code clean * Doc update Update statenode::stages from vector to Array * Headfile update & Python doc update * clang-format fix * pylint fix * Update * Doc update * Update * Bug fix after code merge to the new master * clang-format fix * Update * Update * Update std::vector to Array; Update verbosity setting; Some commemts addressed * std::vector->Array & std::string->String * Add init_state to ComputeDAG * Update * Update some unordered_map to Map * clang-format fix * Comments addressed Delete ReplayAndInferBound Delete ReplaySteps & InferBoundCommon * Lint fix * Update * Update * Update * Update * Update * Update * Update * Update * Update * Rename ansor namespace to auto_schedule * Update * Rename ThreadPool to ParallelFor * Add parallel_for * Remove ThreadPool * Update python/tvm/auto_schedule/auto_schedule.py * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Minmin Sun (孙敏敏) <[email protected]> Co-authored-by: Zhao Wu <[email protected]>

…generating (apache#5962) * Code migration Start (#1) * Init commit: Code migration Start * Add loop_state.cc/h * Add ComputeDAG basic test * Split transform_step out & Update more UTs (#3) * Split transform_step out * Update GetProducers & GetConsumers * Update UTs * Add UT for CacheReadWrite & Some bug fix * Add search_task, measure and serialization (#4) * Add FollowSplit & FollowFusedSplit tests * Update dag.InferBound & its UT * Add search_task, measure and serialization * Update Serialization UT * Add MetaTileRewritePolicy (#5) * Add feature * Add cost_model, meta_tile_rewrite_policy * Add MetaTileRewritePolicy basic UT * Basic Python API for State (#6) * Add Basic Python API for State * Add UTs for State * Add Python API: Measure & Task (#7) * Update the return value of state operation * Add task * Copy measure.py & utils.py * Fix LocalBuilder * Fix LocalRunner * Add ansor.auto_schedule() API; First AutoSchedule working version(#8) * Add basic Python support for ansor.auto_schedule * Update AutoSchedule API * Bug fix for get the attach point of a fused iter * Update UT after infer bug fix * Bug fix & Add python serialization API (#10) * Delete C++ UT hack since Python is ready * Add ndarray.non_empty * Update Serialization python API * Improve code style, python wrapper and test cases (#11) * Update c++ code style and unit test * Update python State wrapper and test cases * fix unit tests * Add RPCRunner & OpenCL/CUDA test (#12) * Add RPCRunner & OpenCL search test * Add CUDA search test * Add RPCRunner test * rebase to upstream/master * Add Ansor basic tutorial (#13) * Add basic tutorial * migrate feature extraction (#14) * Add XGBModel & RPCRunnerWarpper (#15) * Add XGBModel & RPCRunnerWarpper * Revert "Add Parallel Granularity Mutation" * Migrate workload_registry.py (#16) * add workload registry * update * update * add task scheduler (#17) * Add conv2d cuda tutorial with workload registry (#18) * add tune_test.py (the old tune_wkl.py) (#19) * add tune_test.py (the old tune_wkl.py) * update * fix measure * fix for gpu * Code refine for tune_test.py & Add a pre load callback (#20) * Bug fix for tutorials * Add PreLoadMeasuredStates * Add search_callback support for task tuner * Code refine for tune_test.py * Update * Update * Update * Update * Bug fix * Add python custom sketch rule (#21) * Add custom sketch rule * Bug fix * Ansor Relay Integration (without layout rewrite) (#22) * relay integration * Add tune_op_subgraph.py & Some code clean for tune_network.py (#23) * Add single op tune scripts * Add tune subgraph support * Merge all op & all subgraph to one file * Rename file * add explicit_unroll_max_extent (#25) * Add Index simplification & API update (#26) * Add vectorized cooperative_fetching test * Update math simplify for vectorized CF * File rename * Update tune_network * API update * Update PreLoadMeasuredStates & Some bug fix (#27) * Add a threading wrapper to fix the test bug * Set default TVM_USE_AUTO_SCHEDULER to false * Update PreLoadMeasuredStates callback * Add tensorize step for loop_state (#31) * Add tensorize step * State python api update (#33) * Start to update api * Add compute_dag to state * API update * kernel layout rewrite (#28) * kernel layout rewrite * remove some hacks * add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass * set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite * [cache flush] port cache flush to ansor (#32) * Improve relay integration (#34) * tmp checkpoint * Improve relay integration * Improve relay integration * Fix xgb error & Simplify dispatcher (#35) * Rename "MetaTileRewritePolicy" to "SketchPolicy". (#36) * Rename "MetaTileRewritePolicy" to "SketchPolicy". * Add a new class for auto_unroll_max_step, storage_offset in StageNode * fix tune_op_subgraph.py * rebase * Migrate all node::make to noderef's construct function (#37) * Start to move xxxnode::make to noderef() * Update * Update * Finish transform_step * Finish comute dag & auto schedule * Update * Update * Update * Update * Update * Code refine * Code refine * Code refine * Update * Update * Some lint fix & Recover the double constructor of tvm::PrimExpr (#39) * lint fix * clang-format-fix * pylint fix * Update * Recover the double constructor of tvm::PrimExpr * Fix pylint * pylint fix * pylint fix * Add MutateComputeLocation and MutateParallel in evolutionary search (#40) * Add MutateComputeLocation and MutateParallel in evolutionary search * fix lint * Improve loop state python API (stage_tensors -> stage_ops) (#41) * improve loop state python API (stage_tensors -> stage_ops) * fix * ComputeDAG bug fix & Add Custom TensorCore Matmul Example (#42) * Bug Fix * Sample example of Custom TensorCore Matmul * Rever Commits, Start to build minimum Ansor system * Code clean for minimum Ansor system * Bug fix & Delete AccessAnalyzer * Delete attachmap & Code clean * Doc update Update statenode::stages from vector to Array * Headfile update & Python doc update * clang-format fix * pylint fix * Update * Doc update * Update * Bug fix after code merge to the new master * clang-format fix * Update * Update * Update std::vector to Array; Update verbosity setting; Some commemts addressed * std::vector->Array & std::string->String * Add init_state to ComputeDAG * Update * Update some unordered_map to Map * clang-format fix * Comments addressed Delete ReplayAndInferBound Delete ReplaySteps & InferBoundCommon * Lint fix * Update * Update * Update * Update * Update * Update * Update * Update * Update * Rename ansor namespace to auto_schedule * Update * Rename ThreadPool to ParallelFor * Add parallel_for * Remove ThreadPool * Update python/tvm/auto_schedule/auto_schedule.py * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Minmin Sun (孙敏敏) <[email protected]> Co-authored-by: Zhao Wu <[email protected]>

hlu1 and others added 30 commits August 9, 2019 09:20

posix_memalign appears in API 17, not 16 (apache#3532)

3fd9a94

[DEP] Remove HalideIR from submodule (apache#3535)

e5939c0

[Relay][Quantization] Fix add_rewrite and UnifyDTypeScale (apache#3534)

aeea3a4

* [Relay][Quantization] Fix issue introduced in apache#3135 * Recover StopFusion * Fix fmultiref * Fix lint

[ARITH][IR] Introduce FloorDiv/Mod (apache#3479)

4251551

* [ARITH][IR] Introduce FloorDiv/Mod * Address review comments * address review comments, fix div sub rule

[ARITH][BOUND] Fix bound inference to avoid allocating too much (apac…

d569fc6

…he#3526) * [TVM] Fix bound inference to avoid allocating too much * [ARITH][BOUND] Pass analyzer to PropBoundToInputs

[Relay][VM] Port VM, VM compiler, and Object into python (apache#3391)

e4cd280

* tmp * Port vm and object to python * clean up * update vm build module * update * x * tweak * cleanup * update * fix rebase * Rename to VMCompiler * fix

[FRONTEND][TENSORFLOW] Some bug fixes for tensorflow NCHW data_format (…

1f964e6

…apache#3514)

fix (apache#3550)

dc67d12

fix js test load module example (apache#3556)

cf7c835

Fix build error (apache#3552)

930fc22

* Fix build error * comments

fix pynq 32-bit address pointers (apache#3558)

dd81d1e

[Relay][VM]Fix debug statement (apache#3565)

cdc9990

* [Relay][VM]Fix debug statement * Change debug statement

tightening bounding box for IntSet fused in PassUpDomain (apache#3073)

fe98c4f

Apply suggestions from code review Co-Authored-By: Wei Chen <[email protected]>

[Community] Zhi Chen -> Committer (apache#3572)

9685190

Let's welcome Zhi as a new Apache TVM Committer!

Disable MicroTVM on i386 CI (apache#3569)

f6cf6c9

Emit DWARF debug information (apache#3420)

5a5b3d9

[ARITH] Simplify let (apache#3568)

71bc35b

[Relay] parser/pretty printer roundtripping (apache#3536)

0f8b3d4

fix topi c++ conv2d_nchw lambda expr issue (apache#3570)

2b3dba2

avoiding cast None to int errors (apache#3578)

18a286c

Mention minimum version of python features one should stick to (apach…

9acd3a5

…e#3588)

Add printer for Layout/BijectiveLayout (apache#3582)

a0783f4

[RPC] Better handle tempdir if subprocess killed. (apache#3574)

7a7c5fe

[AutoTVM]Improve graph tuner for multiple subgraphs (apache#3490)

b061bd6

* Improve boundary nodes in graph tuner * Limit output node number * Fix test * Improve warning. * Fix test

[TOPI][RELAY] Add op Size (apache#3094)

034524c

[Relay] add some check for the ad algorithm (apache#3585)

4a5f179

* do * fix test

tqchen and others added 22 commits August 9, 2019 09:38

[CI] Update GPU docker (apache#3709)

6d8a5f4

Export tvm::relay::OpRegistry::OpRegistry (apache#3711)

22c47e1

[Bugfix] Fix the issue that function pass modifies original module (a…

8a0bd0f

…pache#3712) * fix * fix interpreter

safe to remove thread related headers? (apache#3713)

7b18ab2

[relay][frontend] clean up tf frontend (apache#3710)

85b375b

* clean up tf frontend * fix get_relay_op

Update dmlc-core to the latest commit (apache#3716)

e36286c

This includes changes to build TVM runtime for Hexagon.

Fix (2/2) [TOPI] conv2d schedule code (apache#3648) (apache#3717)

0ddbe67

* Fix the tile_rx and tile_ry issue. Note that this patch depends on pull request apache#9 in tvm-distro.

fix name (apache#3719)

5f7ccd3

[Frontend][MXNet] Fix mxnet converter for hybridblock and add div_sqr…

07717e5

…t_dim (apache#3701) * Fix mxnet converter for hybrid block * tweak * fix rebase * fix * add test

[Relay/TOPI][Op] Add variance and layer norm op (apache#3700)

d9863c0

* Add LayerNorm op * update * fix * Add mean_std and mean_variance * add std and update doc * add license * x * lint * x * fix * fix doc

Take zero extent loops as NoOp and remove it and add unittest for the…

0a1ab32

… same (apache#3724)

[VTA][Dockerfile] Chisel dependencies for TSIM CI (apache#3721)

abc0fd8

Remove sccache from Rust install (apache#3728)

a4b29fa

[DOCKER] Fix missing apt https transport support (apache#3735)

1c91b37

* [DOCKER] Fix missing apt https transport support * [DOCKER] Drop superflous explicit sudo's

[CI] Update docker image ci_cpu,i386 to include verilator (apache#3738)

935eb1c

[VTA] [Chisel] Bug fix for VME Shell (apache#3737)

f5098d5

* fix * fixes

Fix typo in ir_pass.h (apache#3741)

4c5c134

Update dmlc_tvm_commit

2338d42

wweic mentioned this pull request Aug 14, 2019

Fix Windows build for Neo DLR #1

Merged

Revert "Fix Windows build for Neo DLR"

5f9b07e

This reverts commit 7d96118.

wweic merged commit 5f9b07e into neo-ai:dev Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync upstream #39

Sync upstream #39

wweic commented Aug 9, 2019

Sync upstream #39

Sync upstream #39

Conversation

wweic commented Aug 9, 2019