- On-going version
- 0.8
- 0.7
- 0.6
- Relay in Production
- Relay Virtual Machine
- Training
- Quantization
- Accelerator and Microcontroller Support
- Rust Support
- Operator Support
- Frontend and User Interface
- Runtime and Backend Support
- Language and Architecture
- Symbolic shape enhancement
- Language and Architecture
- Arithmetic Analysis
- Runtime and Backend Support
- Frontend and User Interface
- AutoTVM
- Performance Improvements
- Documentation
- Build and Test
- Bug Fixes
- Known Issues
- Depreciations
- 0.5
- 0.4
- 0.3
- 0.2
- 0.1
- Initial version
This file records the changes in TVM library in reverse chronological order.
Refer to the Roadmap issue for complete list on on-going version features. If you check in something that is not reflected in Roadmap issue, please reply to that issue so it can get added.
Apache TVM v0.8 brings several major exciting experimental features, including:
- PaddlePaddle frontend
- TVMScript: round-trippable python-based syntax for TIR
- TorchScript integration
- TensorIR scheduling language
- TensorRT and CUTLASS integration via BYOC
- Int4 TensorCore support in AutoTVM
- MicroTVM Project API and Zephyr, Arduino support
- AOT executor
- Robost Windows support
- Affine analysis infra: iter-affine-map
- Improved Vulkan backend
- CUDA graph support in TVM runtime
Besides, The community has been working together to refactor and evolve the existing infrastructure, including but not limited to:
- Relay compilation engine
- Relay pattern language
- CI and build process
- Refactoring documentation and tutorials
- Stablizing AutoScheduler
- Stablizing TVMC command line driver interface
- Stablizing target system
- Frontend coverage, quantization, dynamic shape, training
Full changelog: https://gist.github.com/junrushao1994/c669905dbc41edc2e691316df49d8562.
The community has adopted a formal RFC process. Below is a list of the formal RFCs accepted by the community since then:
- [RFC-0005] Meta schedule (AutoTIR)
- [RFC-0006] Automatic mixed-precision pass and support
- [RFC-0007] Parametrized unit tests
- [RFC-0008] MicroTVM Project API
- [RFC-0009] Unified static memory planner
- [RFC-0010] Target-registered compiler flow customisation
- [RFC-0011] Arm® Ethos-U integration
- [RFC-0014] Pipeline executor
- [RFC-0015] Use CMSIS-NN with TVM
- [RFC-0019] Add PaddlePaddle frontend
- [RFC-0020] Extend metadata in project option
- [RFC-0022] TIR non-scalar constants
- [RFC-0023] Adding annotation field to
tir.allocate
nodes - [RFC-0025] PyTorchTVM
- [RFC-0027] Formalize TVM documentation organization
- [RFC-0028] Command line composition from internal registry
- [RFC-0029] Migrating target attributes to IRModule
- [RFC-0030] Command line configuration files
- [RFC-0031] C Device API
- [RFC-0036] TVMScript namespace
- [RFC-0041] Update TVMScript block syntax
- TVMScript parser and printer #7630 #9115 #9286
- Scheduleable TIR (S-TIR) infrastructure, analysis and lowering passes #7553 #7765 #7847 #8114 #8121 #7873 #7923 #7962 #7848 #8044 #7806
- S-TIR schedule primitives:
compute-inline
,reverse-compute-inline
,fuse
,split
,rfactor
,storage-align
,vectorize
,unroll
,bind
,reorder
,cache-read
,cache-write
,compute-at
,reverse-compute-at
,decompose-reduction
#8170 #8467 #8544 #8693 #8716 #8767 #8863 #8943 #9041 - While loop in TIR #7425 #9004
- Metaprogramming in S-TIR via
specialize
#8354 - Support Return value in TIR #7084 #7932
- Storage scope support in
PointerType
#8017 #8366 #8463 - Creation of S-TIR via TE compute #7987
- PopenPoolExecutor is used to replace python native library to provide better multiprocessing support as well as enable auto-tuning in Jupyter notebooks for AutoTVM and AutoScheduler #6959 #8492 #8913 #8820 #8851
- AutoScheduler improvement and stabilization: task scheduler, layout rewrite, early stopping, dispatching #6945 #6750 #6987 #7156 #8862 #8995 #7571 #7376 #7377 #7344 #7185
- AutoScheduler support for sparse workloads #7313 #7635 #8065
- AutoScheduler support for Vulkan, ROCm, Mali #7626 #7038 #7132
- AutoTVM support for int4 TensorCore #7831 #8402
- Meta Schedule core infrastructure, builder runner and database #8615 #8623 #8642 #8817 #9079 #9132 #9154 #9053 #9059 #9044 #9111 #9061 #9153
- Operators for Int-8 vision transformer on GPU #7814
- Optimizing NMS and ROI-related kernel on GPU #7257 #7172 #7136 #7796 #7463 #6516 #7440 #7666 #8174
- Support and optimize sparse operators #8605 #7477 #7435 #6889 #6580 #8437
- Sort-related operators and optimization #9184 #7669 #8672 #7611 #7195 #7056 #6978
- Support for einsum operator #6370
- Matmul, dense operators and their optimization #8921 #8527 #8234 #8250 #6616 #8229 #8401 #7404 #8669
- Convolution and pooling operators and their optimization #8620 #8936 #8584 #7075 #7142 #7515 #6999 #6899 #6840 #6137 #6802 #6445 #6711 #6714 #8167 #8222 #8275 #8276 #8422 #8430 #6687 #7928 #8897
- Scatter and gather operators and their optimization #8479 #7600 #7044 #7464 #7233 #6533 #6856 #6854 #7927 #8105
- Prefix scan, cumsum and cumprod #7722 #7303 #7314 #7334 #7123 #6868
- Dynamic shape and shape functions #7414 #6979 #6912 #6898 #6373 #8068 #7490 #7487
- Miscellaneous improvement. Operators including: reshape, resize, pad, PRNG, transpose, where, softmax, concat, nll_loss, space_to_batch_nd, batch_to_space_nd, slice_like; Libraries including thrust, cuDNN, cuBLAS, MIOpen; Improving schedules for generic reduction and softmax. #8592 #7375 #7287 #7184 #7131 #7086 #7083 #8030 #6851 #6477 #8346 #6759 #8028 #8056 #8369 #7468 #7458 #7194 #8138 #8543
- Pattern language and mixed-mode visitor: matching more IR constructs, fuzzy matching; converting more passes to non-recursive. #8843 #7754 #7355 #7332 #7282 #7151 #7120 #6958 #7507 #8325 #8774 #7817 #7374 #6695 #6704
- Improving or adding passes including ExtractOperators, SimplifyExpr, DynamicToStatic, DefuseOps, ConvertLayout, FoldConstant. Added a set of utilities that allows a model to be run efficiently on TensorCores #9253 #9245 #8996 #7827 #9034 #7807 #8755 #7731 #7368 #7603 #7656 #7423 #7354 #6946 #6748 #6720 #6776 #7835 #7895 #8205
- TECompiler and refactoring of compilation workflow #9103 #8974 #8886 #8802 #8501 #8526 #8486 #8597 #7518 #7552 #8914 #9130
- Quantization and automatic-mixed precision #8883 #8810 #8644 #7613 #8069 #8341 #8126 #8460
- Parser, printer and diagnostic #7347 #6274 #6692 #8352 #8000
- Pipeline Executor #8702 #9108
- CUDA graph integration in graph executor #7616
- Enable add
set_output_zero_copy
in graph executor #8497 - VM: memory allocation improvement, shape function improvement and misc #7746 #7451 #7413 #7210 #8040 #6938 #8661 #7676 #8285
- AOT compilation and execution #8697 #7785 #8014 #8023 #8096 #8075
- Project API infrastructure: #8380 #8963 #8708 #8019
- MicroTVM, Zephyr, Arduino RVM, AutoTVM support #9320 #8941 #7804 #7786 #7449 #7891 #7915 #8055 #8037 #8386 #8519 #8748 8154 #8945 #8624 #8701 #7723 #8715 #7225 #6964 #7813 #7528
- The pure C runtime (CRT) #7398 #7333 #7095 #7225
- Model library format #8270 #8072 #7938
- Tighter bounds and more simplification on cast #6771 #7045
- Introducing iterator (quasi-) affine map detection #6667 #7752 #7759
- Inverse of iterator affine map #8384 #8427
- Subspace division in iterator affine map #7760
- PaddlePaddle initial support #8645 #9124 #9126 #9295 #9370 #9236 #9283
- ONNX support, including better handling of control flow, coverage of more operators, better dynamic shape support, more tests. #9265 #9178 #9146 #8894 #8966 #8967 #7818 #9000 #9001 #9066 #9028 #9002 #8985 #9019 #9017 #8972 #7802 #7800 #7781 #8919 #9054 #8906 #8933 #8959 #8907 #7771 #8923 #8924 #7755 #7720 #8773 #8872 #7655 #8741 #7633 #8781 #8866 #8867 #7522 #7519 #7489 #7438 #7429 #7364 #7300 #7259 #7243 #7237 #7208 #7189 #7115 #7109 #7089 #7036 #7031 #6839 #6351 #7842 #7844 #6646 #6647 #6681 #6700 #7883 #6726 #6730 #7899 #7900 #7906 #7934 #7956 #8007 #8011 #8084 #8099 #8189 #8191 #8304 #8321 #8337 #8356 #8385 #8502 #8426 #8440 #8456 #8475 #7391 #7394 #8621 #8322 #8323 #8435 #8436 #8455 #7353 #7215
- TensorFlow and TFLite, including more operators, better TensorArray support and quantization #9404 #9256 #8689 #7789 #7736 #8763 #8647 #8648 #8558 #8780 #8538 #7659 #7639 #7531 #7520 #7502 #7496 #7473 #7452 #7442 #7441 #7400 #7320 #7293 #7267 #7159 #7148 #7114 #7113 #7093 #7074 #7048 #7030 #6998 #6984 #6970 #6949 #6933 #6918 #6901 #6885 #6849 #5767 #6589 #6670 #6674 #6675 #7866 #6685 #7885 #6729 #7901 #6774 #6783 #6799 #7951 #8024 #8051 #8060 #8074 #8142 #8179 #8251 #8277 #8335 #8364 #8375 #8431 #8454 #6818 #8483 #9099 #9165
- PyTorch: more operators including activations, inplace operators, RNNs, NMS #9371 #9204 #9185 #9135 #9133 #9015 #8839 #8718 #8699 #8692 #7712 #8753 #7694 #8583 #7675 #7646 #7606 #7592 #7569 #7544 #7549 #7535 #7517 #7465 #7397 #7371 #7348 #7346 #7325 #7231 #7174 #7154 #7137 #7134 #7133 #7128 #7088 #7023 #6900 #6602 #7845 #6659 #6740 #6782 #6784 #7958 #8192 #8397 #8398 #8403 #8447 #6829
- MXNet support. More operators and NLP model coverage in GluonNLP #7568 #7409 #7209 #7191 #7062 #6561 #6699
- Misc: CoreML, Keras, DarkNet, etc. #7667 #6676 #6651 #6963 #7949 #7035 #7446 #8562 #8599
-
LLVM backend: recover LLVM support on windows; support target feature strings in function attributes; atomic support in NVPTX, ROCm; LLVM compatibility to LLVM 12+ #9305 #9223 #9138 #8860 #8958 #6763 #6698 #6717 #6738 #8293 #6907 #7051
-
ROCm 3.9 bitcode files search #6865
-
Vulkan and SPIR-V refactoring and major improvement in codegen and runtime. A critical bug fix in SPIRV codegen allows the Vulkan backend to produce correct outputs on more hardwares and drivers. Added support for querying device specific hardware parameters and capabilities, dynamic shapes, irregular ops such as sorting and NMS, UBO, fp16, and vectorization. We can now run complicated models like MaskRCNN on Vulkan end to end. #8904 #7833 #7717 #7681 #8746 #8813 #7609 #8882 #7607 #7591 #7574 #7572 #7833 #6662 #7969 #8013 #8048 #8098 #8102 #8107 #8127 #8151 #8196 #8320 #8588 #8332 #8333 #8348 #8528
-
Metal language version upgrade (
MTLLanguageVersion2_3
), better codegen support, int64 support, various bug fixes #7830 #7819 #7714 #7118 #7116 #7105 #7980 #8054 #8175 #8202 #8206 #8313 -
OpenCL, VTA, Verilator: refactored code generator, better error messages, various bug fixes #7834 #7777 #7761 #7100 #6125 #6126 #6191 #7834 #8256 #8257 #8731 #8756 #8973
-
CUDA: enable
__launch_bounds__
, dynamic shared memory, TensorCore, BF16, half2, NVCC version upgrade #9341 #8678 #7561 #7273 #7146 #7147 #7099 #7065 #7033 #7014 #7907 #7964 #9087 #8135 #8137 #8457 #8466 #8571 -
ARM: CMSIS-NN, Ethos-N #8653 #7628 #8951 #7506 #7443 #7858 #6982 #8795 #8806 #8833 #9147 #9159 #9160 #9162 #9163 #9167 #9209 #9386 #9387
-
Hexagon: build, compilation, model launcher, more target options and better runtime #7784 #6718 #8821 #8822 #9033 #8823 #8859 #8865 #8915 #8954 #9024 #9025 #8960 #8986 #9010 #9011 #9189 #9220 #9355 #9356
-
WASM: Update support for latest emcc, add ffi test. #6751
- TensorRT initial integration, stabilization, int8 calibration, dynamism support #6395 #7702 #7595 #7581 #7412 #7372 #9047 #8073 #8808 #6905 #7967 #8005 #8172 #8461 #8506 #8607 #7205 #7026 #7016 #7011 #6955 #6872 #7253 #6805 #9324
- Arm Compute Library (ACL) integration #7649 #7206 #6532 #7121 #6724 #8149 #7251 #9396
- Verilator integration #7406 #7351 #7286 #8094
- VitisAI integration #6343 #7350
- BYOC infrastructure enhancement: improving control flow, AnnotateTarget, custom codegen #6641 #6655 #6697 #6786 #7977 #8464
- MacOS support #8396
- AutoScheduler support #7070
- Support cross compiler options #7922
- Python scripting #7823 #7698
- More flexible input specification #7366 #7788
- More options,
--disable-pass
and--config
#7816 #8253 - Allow passing optional arguments to importers #7674
- Model library format (MLF) support #8086 #8331
- More backend and library support: metal, ACL, Vulkan, OpenCL, ROCm, Vitis AI #8282 #7508 #8359 #6831 #8896 #7577
- Support for the new target system #7651 #7654 #6788 #7304 #6855
- Rust bindings installable via Cargo #7503 #6678 #8631 #8665
- Initial support for diagnostic interface #6656
- Fixes for using Python APIs from Rust #7085
- Improve NDArray, GraphRt, Relay, IRModule, Array, Attrs bindings #6563 #6741 #7138 #8353 #7082
- Improve error handling, error messages and fix memory leaks #8289 #6815 #8714 #8725
- Enhanced CPP-RPC implementation: allow user supplied work dir, support of CPP-RPC server for Apple, support adb-shell style CPP-RPC #7670 #8224 #8223 #7766 #7013
- Use PopenWorker to handle RPC system: #7889 #7757 #7961
- Fold target host into target #7462 #7791 #7534 #8835
- Target-based intrinsic lowering and legalization #7936 #7809
- Add target tags for all existing CUDA GPU models #7410
- Linear Congruential Random Engine #8642
v0.7 brings many major features. The community works together to refactor the internal code base to bring an unified IR code structure with a unified IRModule, type system and pass infrastructure. We have also bought many exciting new features, some highlights include:
- Initial automatic scheduling support
- Initial command line driver interface
- WebGPU and webassembly support
- Better first class rust support in the codebase
- Intial Hexagon support
- Bring your own codegen (BYOC) support
The community also continues to bring high quality improvements to the existing modules including, but not limited to: better frontend coverage, performance, quantization, microTVM and dynamic shape support.
- Phase 0: Ansor minimum system for auto schedule generating #5962
- Phase 1: Access Analyzer #6103
- Phase 1: Add
follow_split
andfollow_fused_split
steps #6142 - Phase 1: Add
pragma
/storage_align
/rfactor
steps #6141 - Phase 1: Add RPC Runner #6077
- Phase 1: Add
annotation
/compute_at
/compute_root
/compute_inline
steps #6073 - Phase 1: Add
cache_read
/cache_write
steps #6107 - Phase 1: Rename namspace form
auto_schedule
toauto_scheduler
#6059 - Phase 1: The base class for cost models #6187
- Phase 1: feature extraction for cost models #6190
- Phase 1: XGBoost Cost Model #6270
- Phase 2: Basic GPU Sketch Search Policy #6269
- Phase 2: Evolutionary Search #6310
- Phase 2: Update heavy operations with
parallel_for
#6348 - Parallel the InitPopulation (#6512)
- Tutorial: Using the template-free auto-scheduler on CPU (#6488)
- External codegen support in Relay (#4482), (#4544)
- Bring Your Own Codegen Guide -- Part 1 #4602
- Bring Your Own Codegen Guide -- Part 2 #4718
- Relay annotation and partitioning for external compilers #4570
- JSON Runtime with DNNL End-to-End Flow #5919
- Handle one symbol for each runtime #5989
- Run accelerator specific optimizations #6068
- Arm Compute Library integration #5915
- Retire the example json runtime #6177
json_node.h
should includedata_type.h
#6224- Improve installation tutorial #6170
- Add support for dense (fully connected) layer #6254
- Introduce the Ethos-N BYOC integration #6222
- Enable remote device via environment variables #6279
- Improved pooling support #6248
- Add support for quantized convolution #6335
- CoreML codegen #5634
- Add
strided_set
operation (#4303) - Add support for conv3d (#4400), pool3d (#4478), 3d upsampling ops (#4584)
- Add group convolution for VTA (#4421)
- Add 1d deconvolution op (#4476)
- Allow batch matmul to be fused into injective ops (#4537)
- Add native depthtospace and spacetodepth operators (#4566)
- Add CUDNN conv3d support (#4418)
- Dilation2D operator support #5033
- Isfinite operator #4981
- Unravel Index operator #5082
- Add thrust support for nms #5116
- Resize3d, Upsample3d op support #5633
- Add operator Correlation #5628
affine_grid
andgrid_sample
#5657- Sparse to dense operator #5447
Conv3d_transpose
op support added #5737- add op
crop_and_resize
#4417 - Add bitwise ops #4815
- Sparse to dense operator #5447
- support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice #4312
Conv3d_transpose
op support added #5737- ReverseSequence operator #5495
- Conv1D #4639
- 1D Pooling #4663
- Channel wise quantization - Quantize & Requantize #4629
- Support QNN ops. #5066
- Adding support for QNN subtract op #5153
- TFLite QNN Tutorial #5595
- Tutorial: Deploy Quantized Model on CUDA #4667
- Support asymmetric per-layer quantized operators #6109
- Add convertlayout pass in Relay (#4335, #4600)
- Added Merge Composite pass #4771
- Call graph for relay #4922
- Add inline pass #4927
- Target annotation for external codegen #4933
- GradientCell Relay Pass #5039
- Add MergeCompilerRegions pass #5134
- Non-recursive Graph Vistor and Rewriter (#4886)
- [Blocksparse] Pipeline for lowering dense model to sparse-dense (#5377)
- Relay op strategy #4644
- Static Tensor Array (#5103)
- Memory planner (part 1) #5144
- ONNX codegen #5052
- Add Parser 2.0 #5932, part 2 #6162
- Basic block normal form #6152
- Convert Layout pass. #4664
- Pattern Language, Matcher, Rewriter, and Function Paritioner #5231
- Add ADTObject POD container type (#4346)
- TFLite RPC runtime (#4439)
- Standardized graph runtime export (#4532)
- MISRA-C compliant TVM runtime #3934
- Add String container #4628
- Introduce Virtual Memory Allocator to CRT (#5124)
- Initial implementation of Hexagon runtime support (#5252)
- FastRPC interface for Hexagon runtime (#5353)
- CoreML Runtime (#5283)
- AutoTVM + uTVM for Cortex-M7 (#5417)
- Windows Support for cpp_rpc (#4857)
- Implement TVMDSOOp(TensorFlow custom op) for TVM runtime (#4459)
- WebGPU support #5545
- TVM WebAssembly JS Runtime #5506
- Hexagon driver for offloading kernels to simulator #5492
- Introduce runtime::Array #5585
- Allow non-nullable ObjectRef, introduce Optional. (#5314)
- Introduce static slots for common objects. (#5423)
- ntroduce RValue reference(move) support to TypedPackedFunc (#5271)
- Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770
- Support module based interface runtime #5753
- Add TVM application extension with WASM runtime #5892
- Provide guide to user who has difficulty register SEqualReduce (#5300)
- Revive the Rust + SGX refactor #4976
- Improve Rust bindings: Map, Array, String, various IR nodes #6339
- Rust Refactor Stage 4: Rewrite Rust graph runtime to use new APIs #5830
- Second stage of Rust Refactor #5527
- tvm crate stage 3 of Rust refactor #5769
- Add first stage of updating and rewriting Rust bindings. #5526
- Introduce StructuralHash for the Unified IR. #5160
- Introduce StructuralEqual Infra for the unified IR. #5154
- Introduce ExprDeepEqual, Remove IRDeepCompare #5206
- [TIR] Introduce BufferLoad/Store (#5205)
- Improved massive build times caused by tir.floormod and tir.floordiv. Fixed Topi testcase. #5666
- Buffer logger assert removed #6147
- Enhance VerifyGPUCode #6194
- HoistIfThenElse added #6066
- Hybrid Script Support for TIR #6227
- Migrate Low-level Passes to Pass Manager #5198
- HoistIfThenElse added #6066
- Hybrid Script Support for TIR #6227
- Block scope hoisting added #6238
- reverse-mode autodiff without any optimization #5121
- Tensor Expression Debug Display (TEDD) #4651
- Optimize and eliminate the Jacobian tensor for te.autodiff #6078
- TVMC - A command line driver for TVM (Part 1) #6112
- TVMC - Linting error on onnx command line driver frontend #6536
- TVMC - Command line driver 'compile' (part 2/4) #6302
- TVMC - Introduce 'tune' subcommand (part 3/4) #6537
- TVMC - Introduce 'run' subcommand (part 4/4) #6578
- TVMC - Getting started tutorial for TVMC #6597
- Cleanup legacy verilog code (#4576)
- uTVM support for ARM STM32F746XX boards (#4274)
- Add --runtime=c, remove
micro_dev
target, enable LLVM backend #6145
- Linear system and equation solver (#5171)
- Inequalities solver #5618
- Improve IntervalSet's floormod (#5367)
- Remove legacy const pattern functions (#5387)
- Handle likely in IRMutatorWithAnalyzer #5665
- ExtendedEuclidean merge impl to int_operator #5625
- Rewrite simplify fix for Vectorized Cooperative Fetching #5924
- Adding ROCM schedules for TOPI (#4507)
- NHWC conv2d schedule templates for ARM (#3859)
- Use VM compile to extract autotvm tasks #4328
- Download fallback schedule file if it does not exist #4671
- Ignore error when removing tmpdir #4781
- Fix a bug in generating the search space #4779
- Minor bug fixes in AutoTVM for QNN graphs #4797
- Fix autotvm customized template #5034
- Add opt out operator for
has_multiple_inputs
for graph tuner #5000 - Customize SI prefix in logging (#5411)
- Update XGBoost verbosity option #5649
- Support range in index based tuners #4870
- Enable random fill and CPU cache flush for AutoTVM and Ansor (#6391)
- Auto-scheduler tutorial for GPU and necessary refactor/fix (#6512)
- [BYOC] Bind constant tuples in graph partitioner (#5476)
- [BYOC] Add support for composite functions in BYOC (#5261)
- [BYOC] Register pattern tables from external codegens (#5262)
- [BYOC] Enhance partitioning and external codegen (#5310)
- [BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes (#5277)
- [BYOC] Use Non-Recursive Visitor/Mutator (#5410)
- [BYOC] Refine DNNL Codegen (#5288)
- [BYOC] Add example of Composite + Annotate for DNNL fused op (#5272)
- [BYOC] Prevent duplicate outputs in subgraph Tuple (#5320)
- [BYOC] Introduce further operator support (#6355)
- [BYOC] Support input nodes with multiple entries (#6368)
- [BYOC] Add maximum support for float32 (#6506)
- Intrinsic dispatching with OCML instead of LLVM for ROCm (#4499)
- Make target codegen take IRModule and PrimFunc. #5107
- Enhance CUDA codegen for SelectNode #4983
- Vectorization for intrinsics #5101
- [LLVM] Do not use
x86_vcvtph2ps_256
intrinsic with LLVM 11+ (#5267) - [LLVM] Use llvm::ElementCount with LLVM 11+ when creating vectors (#5265)
- [LLVM] Use llvm::FunctionCallee in IRBuilder::CreateCall with LLVM 11+ (#5338)
- [LLVM] Include Support/Host.h for declaration of getDefaultTargetTriple (#5268)
- [LLVM] Replace calls to Type::getVectorNumElements (#5398)
- [LLVM] Use ArrayRef in calls to CreateShuffleVector (#5399)
- [LLVM] Use llvm::Align with LLVM 11+ to avoid warnings (#5264)
- [CodeGen] Cleanup generated code (#5424)
- Rename
target_id
=>target_kind
#6199 - 64-bit RPi4b target #6211
- Creating Target from JSON-like Configuration #6218
- Add python binding to new JSON target construction #6315
- Use target class in all codegens #6347
- Initial support for Hexagon codegen #6261
- Add --runtime=c, remove
micro_dev
target, enable LLVM backend #6145 - Add tvm::support::hexdump() debug utility #6154
- Adding AMD codegen unit tests (#4509)
- Support cuda tensorcore subbyte int data type in auto tensorcore #4546
- Handle empty LLVMModule in GetFunction #5146
- Support int4/int8 conv2d tensor core with HWNC layout #6121
- Add shape function for
zero
,zeros_like
,ones
,ones_like
(#4448),tile
(#4441) - Support symbolic newshape for Reshape #5429
- Support symbolic TopK, Ones, Zeros and Full #5459
- Add
shape_of
instruction #5855 - symbolic
max_output_size
#5844 - Dynamic TopK Op #6008
- Dynamic
broadcast_to
,zeros
,ones
#6007 - Add dynamic reshape grad #6080
- Keep fixed dim when unifying dynamic shape #5795
- OneHot operation #6209
- Add Dynamic Resize Op #6198
- Dynamic full operator #6260
- Dynamic upsampling relay op #6273
- Dynamic Tile Op #5983
- TFLite parser support for
transpose_conv
(#4440),unpack
(#4447) - LLDB pretty printers for relay (#4453)
- ONNX to Relay converter op support: expand op (#4483)
- ONNX
auto_pad
in conv and convtranspose (#4563) - TF to Relay converter op support (#4504) (#4551) (#4484)
- Remove unnecessary cast of constants in ONNX converter (#4573)
- Add support for tf.Keras networks in Relay Keras frontend #4630
- Add conv3d #4604
- Fix incorrect calculations in tf SLICE #4518
- Dynamically calculate
input_stats
of anyfake_quant
range #4789 - LSTM Support #4825
- Add
MIRROR_PAD
operator #4822 - use qnn helper function in softmax #4840
- Add Resize op converter #4838
- Add support for
TFLite_Detection_PostProcess
#4543 - Fix tests for tflite unary elemwise operations #4913
- GaussianDropout/Noise parsing support #4928
- Add parser support for 'square' operator #4915
make_loss
operator support #4930- Add parser support for
l2_normalization
#4966 - ReadVariableOp operator support #4952
- Check graph inputs match expected #4992
- support multiply outputs #4980
- TFLite: Using real image for QNN testing. #4816
- TFLite:
FLOOR_MOD
&FLOOR_DIV
support #4971 - PyTorch: Upsampling op support and enable registering a user defined op conversion map #4961
- PyTorch: fix unordered dictionary problem for python version under 3.6 #4982
- Operator support NonZero #5073
- Upsampling op support and enable registering a user defined op conversion map #4961
- Check graph inputs match expected #4992
- Add support for quantized models via QNN #4977
- Add initial control flow support #4964
- Remove FP32 piggy back and use QNN add/mul/concatenate #5061
- Add missing upcast to uint8
avg_pool
conversion #5089 - Add initial 3D op support and test on Resnet 3D #5075
- Fix conv2d conversion for group conv (group > 1 but != in channels) #5132
- Add support for
max_pool1d
#5142 - Add support for split #5174
FLOOR_MOD
&FLOOR_DIV
support #4971- Activation functions support #4978
- Round op parsing support added #5022
- DepthToSpace and SpaceToDepth support #5041
TOP_K
op parser support #5051- ReadVariableOp operator support #4952
- Support multiply outputs #4980
reduce_any
op parsing support #4926- TensorFlow Parser Control Flow Enhancement #5020
- TensorFlow Frontend support with shared params #5042
- Support for AddV2 in Relay Tensorflow frontend converter. #5046
- conv3d frontend operator support #5080
max_pool3d
and Averagepool3d operator support #5085- Support for Atan/Atan2 in Relay Tensorflow frontend converter. #5104
- Use leaky by default for LeakyReLU #5192
- Conv3D ONNX support and
conv3D_ncdhw
x86 schedules #4949 - Add support for FusedBatchNormV3 #5065
- Activations for pytorch #5194
- Dropouts And InstanceNorm support added #5203
- [Frontend] Asymmetric padding of convolution support (#4803)
- [ONNX]Pool3d & upsample3d op support (#5135)
- Add TopK to ONNX Frontend (#5441)
- Add RoiAlign to Onnx frontend (#5454)
- [PYTORCH]AvgPool3d, MaxPool3d and Squeeze op support (#5220)
- [PYTORCH]celu, gelu, selu activations (#5263)
- [Pytorch]layernorm bug fix and testcase updated (#5257)
- [PYTORCH]LayerNorm support added (#5249)
- [PYTORCH]GroupNorm op support added (#5358)
- [PYTORCH]Logical & Bitwise operator support (#5341)
- [PYTORCH]Tensor creation ops support (#5347)
- [PYTORCH]cosh,sinh,log2,log10,log1p op support (#5395)
- [PYTORCH]Rsub, Embedded, OneHot ops support (#5434)
- [PYTORCH]Abs, Arange, Softplus ops (#5295)
- [PYTORCH]isNan, isinf, isfinite, ceil, clamp, round ops (#5316)
- [PYTORCH]Activations for pytorch (#5194)
- [PYTORCH]Repeat, Reciprocal & Reshape Op support (#5280)
- [PYTORCH]
Reduce_ops
support added (#5308) - [PYTORCH]Take, Topk op support (#5332)
- [PYTORCH]Dropouts And InstanceNorm support added (#5203)
- [PYTORCH]Unary Ops frontend support. (#5378)
- [Torch] Support Python list, more realistic recurrent networks (#5306)
- [PYTORCH]where, addcdiv, addcmul op support (#5383)
- [Torch] Add support for split (#5174)
- [Torch] Fix up graph input handling (#5204)
- [TFLITE]Logical not op support (#5475)
- [TFLITE]Hard Swish & MobilnetV3 model testing (#5239)
- [TFLITE]Gather, StridedSlice op support added (#4788)
- [TFLITE] Match TFLite shape for SSD custom op (#5473)
- Factor out import of common tflite.Operator in tflite frontend. (#5355)
- [TFLite] support for FILL and
SPLIT_V
operators (#5330) - [TFLite]
L2_POOL_2D
operator (#5452) - [TFLite] Add config option to specify FlatBuffers location (#5425)
- [TFLITE]Logical not op support (#5475)
- [TENSORFLOW]reduce ops updated (#5180)
- [TENSORFLOW] Fix
gather_nd
indices (#5279) - [TensorFlow]Improve TensorFlow Static Shape Tensor Array (#5243)
- [KERAS]Minimum & AlphaDropout op support (#5380)
- [KERAS]Embedding layer (#5444)
- [KERAS]
Max_pool3d
and Averagepool3d operator support (#5085) - [CAFFE2]add Mul and ConvTranspose operator (#5302)
- [MXNET]DepthToSpace & SpaceToDepth Operator (#5408)
- [MXNET]broadcast and logical op support (#5461)
- [MXNET] Use leaky by default for LeakyReLU (#5192)
- [MXNET] support elemwise logic ops (#5361)
- [Frontend|MXNet] SwapAxis operator support (#5246)
- [RELAY] Move frontend utils (#5345)
- [Pytorch] Fix translation of transpose when axis argument is as a list (#5451)
- LpPool Support added #5696
- Skip ADD inside Gemm op when vector is zero #5697
- ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added #5721
- MaxRoiPool, Mod & Xor op support added #5729
- Skip multiply with 1.0f constant for GEMM import #5800
- StatefulPartitionedCall/PartitionedCall Ops support added #5617
- Don't add cast for batch norm when type isn't changing #5731
- Conv3d Transpose OP added #5775
- expand bug fix #5576
- Support
max_pool2d_with_indices
#5549 - Add prim::device op #5584
- ImplicitTensorToNum support added #5603
- Matmul fix for
batch_matmul
#5604 - ReflectionPad2d op #5624
- Padding op support #5638
- Minor bug fixes #5683
floor_divide
support for squeezenet #5702- ReplicationPad support added #5708
- aten::norm support added #5776
- broadcast and logical op support #5461
- MaxPool3d and AvgPool3d Ops support added #5614
- Softmin, trunc op support added #5715
- conv3d and
conv3d_transpose
addedx #5814 - Model importer to be compatible with tflite 2.1.0 #5497
- Nit: Function names made consistent #5515
- Select op support for tflite frontend #5486
GATHER_ND
#5508- Quantize & Dequantize op #5394
- Fully connected op conversion made in sync with TFLite #5510
ADD_N
operator #5474- onnx, mxnet, pytorch mathops added #5561
- abs, round, reciprocal, sign, softsign,
hard_sigmoid
ops support #5587 - Gather nd bug fix for one dim support in tensorflow #5588
- Add parser support for shape and range #5329
- Darknet support batch size for yolo #5688
- Improve Control Flow and TensorArray #5699
- MXNet: Softmin, trunc op support added #5715
- MXNet: conv3d and
conv3d_transpose
addedx #5814 - MXNet: Add parser for
contrib.box_decode
#5967 - Onnx: ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added #5721
- Onnx: MaxRoiPool, Mod & Xor op support added #5729
- Onnx: Skip multiply with 1.0f constant for GEMM import #5800
- Onnx: Fix an issue with #5755 and add Batch norm unit tests. #5845
- TensorFlow: StatefulPartitionedCall/PartitionedCall Ops support added #5617
- TensorFlow: Don’t add cast for batch norm when type isn’t changing #5731
- TensorFlow: Conv3d Transpose OP added #5775
- Add parser support for shape and range #5329
- Darknet support batch size for yolo #5688
- Improve Control Flow and TensorArray #5699
- Improve TF Parser to keep output nodes for
saved_model
#5794 - Add parser support for
relu6
,leaky_relu
,relu_n1_to_1
,log_softmax
#4805 - Fix TF Dynamic input shape #5825
- Support a few contrib ops in mxnet #5819
- Improve TF Parser to keep output nodes for
saved_model
#5794 - Add parser support for
relu6
,leaky_relu
,relu_n1_to_1
,log_softmax
#4805 - Check all unsupported ops before raising an exception #5929
- Add Pytorch advanced indexing #6318
- Support
index_select
#6295 - Fix cast to long #6301
- Fix dtype handling for modules with integer parameters #6311
- pytorch frontend support conv1d #6203
- Add cast to double, fix flatten conversion #6357
- Fix aten::max and aten::min conversion #6372
- Match pytorch 1.6 googlenet pretrained model (#6201) #6212Add unbiased variance op and corresponding support in pytorch frontend #6232
- Implemented PADV2 Operator for TFLite and added support for constant values in PAD. #6167
- Implemented
ONE_HOT
Operator for TFLite. #6223 - Implemented
EXPAND_DIMS
Operator for TFLite. #6243 - Implemented
REVERSE_V2
Operator for TFLite. #6304 - Implemented
MATRIX_SET_DIAG
Operator for Relay/TOPI and TFLite Frontend. #6303 - RESHAPE with dynamic shape arg in TFLite frontend #6208
- Constant input attr added to fully connected operation in TFLite frontend #6228
- Gather operation with indices as tensor expr in TFLite frontend #6168
- Added support for tflite quantized maximum and minimum #6018
- Unary ops support added in frontend #6196
- Introduce caffe frontend for tvm #6206
- Keras softmax and prelu fix under NHWC #6278
- add support for MXNET numpy operators #6054
- Refine tensorflow frontend 1.x & 2.x compatibility #6240
- Reduceops support added to frontend #6252
- Update precision in the ONNX
strided_slice
, update precision of ToScalar #6272 - NHWC import support. #4899
- Refine tensorflow frontend 1.x & 2.x compatibility #6240
- Fix node indices attribute error for tensorflow 2.3 #6288
- Support NMSv4 #6085
- Support for PyTorch Non-Maximum Suppression #6314
- ReplicationPad support added #5708
- MXNet pre-quantized BERT #6039
- Keep parameter names from PyTorch #5887
- Refine LSTMBlockCell to support dynamic rnn #5963
- Add function attributes to IR hash (#4479)
- Relay passes lookup overhead optimization (#4594)
- Add
half_pixel
option to Resize op #4610 - Skip example json runtime test when config is not set #4614
- Test
tensor_array
in vm #4608 - Improve
memory_allocation
pass to support multiple i/o dynamic kernels #4595 - Add unit test for
tensor_array_split
#4619 - Add parses support for unary elemwise ops #4634
- Add parses support for SLICE #4502
- Added pool autopadding and simplified converters. #4672
- Fix meaning of
conv2d_transpose
output_padding
parameter #4318 - Use packed func macro for external codegen #4710
- Fix
_parse_param
bug #4711 - Add constant input support for elemwise ops #4666
- Add parser support for squared difference #4652
- Add type check to dense #4724
- Invoke tvm::build from relay
compile_engine
and interpreter #4723 - Broadcast condition, x, and y for Where op #4774
- Add parser support for relational ops #4695
- Remove duplicated BindParamByName function in VM compiler #4793
- Use SimplifyInference for L2 Normalization. #4795
- Expose vm OptimizeModule to Python #4800
- Add parser support for logical operators #4642
- Conv2D padding representation #4787
- Add support for quantized LOGISTIC #4696
- Fix VM compiler for while loop with free vars #4889
- Fix bug in re-processing call node in MergeComposite pass #4879
- Expose FunctionGetAttr to Python #4905
- Add a PyTorch to Relay Parser #4497
- Support data types for CSourceModuleCodegen args and output #4934
- Clean up and refactor PyTorch frontend #4944
- Relay pass to use fast exp/tanh #4873
- BatchNorm support with run-time mean and variance calculation #4990
- Reduce plevel of conv2d winograd implementation on cuda #4987
- Add operation tan to TVM #4938
- Outline and inline lifted functions for external codegen #4996
- Remove primitive attribute from composite function #5014
- Refactor Relay Python to use new FFI #5077
- Fix relay node registration after refactor #5083
Codegen_c.h
should include relay.function #5093- Move expr.Function to function.py #5087
- Propagate constant to subgraphs #5094
- Adjust strategy plevel to achieve expected performance by default #5118
- Added a AnnotatedRegion utility class #5030
- Support TupleGetItem in body of pattern #5106
- Partition graph codestyle fixes #5202
- Re-wrote the Graph Partitioner to support multiple outputs #5143
- Fixes to MergeCompilerRegions #5195
- Refactor build module to take IRModule #4988
- Separate analysis and transform passes #5035
- Relay Node::make to constructor #5128
- relay::StructuralHash to tvm::StructuralHash #5166
- Conditions updated to cover better user scenarios #5043
- Replace UseDefaultCompiler with GetAttr #5088
- Return empty CSourceModule when no
lowered_funcs
exists in Relay mod #4847 - Clean up for memory pass to enable heterogenous execution support. (#5324)
- Remove re-exports of tvm.transform (#5337)
- [Refactor] Add memoized expr translator for use by backend codegen (#5325)
- Legalize - Use Non-recursive Rewriter. (#5296)
- Add additional check before re-using the cached match #5552
- Remove kCompiler attr from external functions #5615
- Pattern Language MergeComposite #5656
- Support Tuple Output in C/DNNL Codegen #5701
- Infer types in MergeComposite #5766
- Convert PatternGrouper to do pre-order, non-recursive analysis #5653
- Remove constants from partitioned functions #5663
- Add a check for null function attributes #5674
- Add ConstantPattern #5689
- Conditionally Embedding Constants in Partitioned Functions #5693
- Simplify Pattern API Implementations #5703
- Add ShapePattern and DataTypePattern #5760
- Remove unnecessary print #5642
- Improve Shape Func handling for Tuple inputs #5467
- Relay updated with String #5578
- Fix the creation of tuple of tuples in PartitionGraph #5616
- Preserve type information in Merge Composite #5640
- Move
compiler_begin
/end_op
to local static objects #5622 - Fix
dataflow_pattern
.rewrite() hang if Match in IR #5680 - Fix segfault in pretty print when ObjectRef is null #5681
- Move
fallback_device
to config #5690 - Replace
build_config
with PassContext #5698 - Clear compile engine after task extraction #5724
- Add
storage_order
ignore in pooling layer. #5781 - Tweak cublas/cudnn priority level #5820
- Skip Unknown Function Symbols #5888
- Allow every runtime module to handle constants #5885
- handle Tuple/TupleGetItem in first order gradient #5946
- Add resnet-3d & Update network definitions for NHWC layout #5945
- Use TargetNode::attrs for Target serialization #5993
- each option of target str should only contain one ‘=’ #5988
- Rename
target_id
=>target_kind
#6199 - 64-bit RPi4b target #6211
- Add resnet-3d & Update network definitions for NHWC layout #5945
- Small bug fix for Conv1D imports. #5995
- Move
invoke_tvm_op
andshape_func
to vm dialect #5958 - GRU Layer Support #6020
- Add pass for getting calibration data from a relay module #5997
- Merge two consecutive reshape ops #6052
- Add operation
scatter_add
to relay, based on scatter implementation. #6030 - i64 indices #5235
- Port
eliminate_common_subexpr
to non-recursive form #6134 - Fix interpreter for dyanmic shape input of
ndarray_size
#6086 - Allow to config allocator type and refactor vm code structure #6105
- Handle
ndarray_size
in FoldConstant #6156 - when converting constant nodes with types of int64 or float64 #6159
- Add ReshapeTensor instruction in the VM to replace the reshape op #6089
- Support combine multiple dense op just into dense #6062
- Add unbiased variance op and corresponding support in pytorch frontend #6232
- Specify additional layouts in convert layout pass #5422
- Safe check added for Merge Composite Call Node #5562
- Non recursive partitioning #5493
- Support combine multiple dense op just into dense #6062
- Make the max number of fused ops configurable #6327
- Implementation of the dynamic pad operator #6284
- change device annotation from post DFS to recursive #6124
- Make check stricter: disallow inserting function with free vars into module #6313
- Make check stricter by using Feature. Fixed multiple bugs #6326
- Resize support for NCHW-convertible layouts #6293
- Make AutoDiff thread through global function #6336
- Create Interpreter for each constant subgraph #6195
- Add Dynamic reshape to a dynamic namespace and add DynamicToStatic Pass #5826
- Expose relay BindParamsByName to Python #4751
- Implement pass manager tracing API #4782
- Move Ops in relay.op.contrib #4942
- Conditions updated to cover better user scenarios #4951
- [External codegen] Add test cases for fused ops with manual annotation (#4741)
- Multiple output support, reshape, split ops added #6296
- Allow empty tensor for
reshape
,tile
andstrided_slice
#4618 - Fix meaning of
conv2d_transpose
output_padding
parameter"; #4708 - Remove cpp upsampling and resize op #4769
- upsample operator 'NCHWinic' format support. #4791
- Injective schedule improvement #4786
- Enable vectorization on fp16 type #4867
- Support for Int8 schedules - CUDA/x86 #5031
- New PR to re-add tan to TVM #5025
- Register topi schedule for Relay
fast_exp
andfast_tanh
#5131 - Move Dilation2d from nn to image namespace #5110
- Use Thrust sort for argsort and topk #5097
- Conv2d and Dense ops support on Tensor Core #5099
- Setting workload correctly for Depthwise Spatial conv ARM. #5182
- Adding a few missing math intrin #5011
- Missing vectorize for depthwise conv2d. #5196
- [TOPI] Using x86 schedules for ARM conv2d (#5334)
- [TOPI-ARM] Do not alter layout if layout is NHWC (#5350)
- [TOPI] Setting workload correctly for Depthwise Spatial conv ARM. (#5182)
- [OP] Add
fast_erf
implementation (#5241) - [Topi] Tensorcore support for Conv3D (#5284)
- [intrin] a few more math functions (#5468)
- [Intrinsic] Add log1p, ldexp, atan2, hypot, nextafter, copysign (#5312)
- [topi] Add operation relay.nn.dilate() which calls topi.nn.dilate() (#5331)
- [Topi x86] Missing vectorize for depthwise conv2d. (#5196)
- [TOPI x86] Adding
unroll_kw
config option for depthwise conv2d. (#5197) - [Topi] Breakdown topi.cc into smaller files (#5253)
- ReduceLogSumExp Operator support #5453
- Math ops added #5502
- Enable blocking format in x86 conv2d and fold scale axis #5357
- Add operation gather to relay. #5716
- Add
storage_order
ignore in pooling layer. #5781 - Fix bifrost spatial packing conv2d auto tune #5684
- Fix reshape usage in ARM schedule #5732
- Block sparse dense on cuda #5746
- Improve CUDA softmax scheduling #5600
- block sparse dense on cuda #5746
- pass-by-value -> pass-by-const-reference #5783
- Using MKL blas for quantized dense #6115
- topi -> tvm/topi #6186
- Use auto-tuner to improve
conv2d_gemm
performance #6117 - Improve CUDA
conv2d_transpose_nchw
#4762 - Add CUDA conv2d for NHWC layout #4737
conv3d_ndhwc
schedule #4775- Fast exponent #4790
- Add Scatter to Topi/Relay/ONNX via hybrid script #5619
- Split MKL from BLAS. #6182
- Change the meaning of
conv3d_transpose
output_padding
to matchconv{1,2}d_transpose
#6065 - Gather op support added #6013
- Cythonize NDArray.copyto (#4549)
- Unified Object System runtime refactor (#4578, #4581, #4603)
- VM profiler: sort VM stats by time (#4601)
- Update RPC runtime to allow remote module as arg (#4462)
- Refactorying system lib and dso lib into library module (#4481)
- Improve TSIM virtual memory mapping (#4545)
- make adt tag signed #4605
- Improve TVMBackendPackedCFunc to allow return val #4637
- EdgeTPU runtime for Coral Boards #4698
- Fix memory leak when using openMP #4811
- Fix memory leakage of TVMByteArray #4856
- Fix
TVM_DLL_EXPORT_TYPED_FUNC
to work on Windows #4955 - Fix memory leak when using openMP #4811
- Export GraphRuntime in
tvm_runtime.dll
#5002 - MISRA-C compliant TVM runtime #3934
- Update the
type_keys
to reflect the code-org #5074 - Fix AttrEqual for Array and StrMap, double #5054
- Export GraphRuntime in
tvm_runtime.dll
#5002 - Fix unused-value warning #5140
- crt error handling #5147
- Bundle deployment with static linking #5158
- Implemented kDLCPUPinned (cudaMallocHost) #4985
- Explicitly cast min/max operands #5090
ref_counter
->ref_counter_
#5184- Expose runtime::String to Python (#5212)
- [FFI] Refactor runtime.String to subclass str (#5426)
- [RUNTIME] Auto conversion from str to runtime::String in PackedFUnc (#5251)
- [RUNTIME] Improved Packed FFI for optional. (#5478)
- [Hexagon] Add
hexagon_posix.cc
to TVM/RT sources in the right place (#5346) - [FFI] Refactor runtime.String to subclass str (#5426)
- Fix workspace #5503
- Store nullptr PackedFunc as nullptr for better error propagation #5540
- Improve PackedFunc robustness #5517
- Seg fault in WorkspacePool's destructor (#5632) #5636
- Resolve constexpr issue in debug mode. #5651
- Add
compile_shared
option to linux compile utility fn #5751 - Call sync in CopyFromRemote and CopyToRemote #5512
- Fix the multihop cpu case #5522
- Improve RPCServer AsyncIO support. #5544
- Modularize the RPC infra #5484
- Add
compile_shared
option to linux compile utility fn #5751 - Overload string operators #5806
- Only initialize required module #5926
- if a param not in input, we should still consume it’s data #5990
- init TVMPackedFunc’s name #6044
- Enable auto conversion
String->DLDataType
#6214 - Support random fill #5913
- Use new to avoid exit-time de-allocation order #6292
- Add
parallel_for
support to run a loop in parallel #6275 - Solve ARM BIG.LITTLE heterogeneous multicores #4747
- [RUNTIME] Quick fix PackedFunc String passing (#5266)
- Introduce runtime::String::CanConvertFrom #5718
- Restore the StrMap behavior in JSON/SHash/SEqual #5719
- Support overriding RPCWatchdog termination behavior on Android and other platforms #6216
- Set
NDArray::Container.shape_
in NDArray::FromDLPack (#5301) - Enable x86 cpu cache flush #5914
- Conv2D type checking for kernel per-channel scales. #4732
- Add missing nullptr check #4773
- Doc fix on convolution and dequantize #4799
- Conv2D with dilation support. #4796
- Making
scale
/zero_points
as expr instead of attrs. #4611 - Make calibration faster and more memory usage friendly #4589
- Doc fix on convolution and dequantize #4799
- Conv2D with dilation support. #4796
- Optimize lowering for requantize and FixedPointMultiply. #4798
- More doc fix on quantize and convolution #4874
- Add support for per channel weight scale in dense op #4880
- Add support for quantized models via QNN #4977 #5013
- Support 4D padding. #5036
- [Requantize] Cleanup and Optimize Lowering (#5286)
- [Topi, ARM] Disbale Winograd for quantized tensors. (#5363)
- Adding support for TFLite QnnSubtract operator. (#5230)
- Remove developer facing api from frontend exports. (#5375)
- Add Quantize/Dequantize Partitioning #5940
- Add support for quantized models via QNN #5016
- Quanitze operation expanded to take const argument #6127
- FP32 and Quantized Object Detection Model #5479
- Support CallNode inputs in qnn.concatenate #5360
- QNN support for TFLite 2.1.0 quantized models #5848
- Tighten split's extent #4931
- Set split node's range to minimum of ext and split factor or split np… #5044
- Support mixing normal and cross-thread reduction (#5193)
- Inline ->
te/schedule/operation_inline.h
(#5386) - Create loops according to storage scope and thread hierarchies (#5190)
- Fix import in dump pass ir (#5327)
- Scalar support for te.extern #6079
- IR readability enhancement (#4501)
- Introduce tir::PrimFunc #5070
- Introduce PrimFuncPass. #5139
- [TIR] Enhance Substitute, python bindings for Substitute/PostOrderVisit (#5400)
- [TIR] Remove ProducerConsumer and
AllocateNode::new_expr
(#5333) - [TRANSFORM] Enable CopyOnWrite for TIR passes. (#5309)
- [REFACTOR] Migrate LowerTVMBuiltin, InferFragment, LowerThreadAllreduce, ThreadSync to Pass Manager (#5213)
- [REFACTOR] Remove te::Tensor dependencies from TIR passes. (#5372)
- [TIR] Refactor MakePackedAPI to target dependent stage. (#5326)
- [REFACTOR] tvm.hybrid -> te.hybrid (#5223)
- [REFACTOR] Migrate most of low-level build to use the Pass Manager. (#5225)
- [REFACTOR] Migrate low-level passes in tvm.lower to the Pass Manager (#5364)
- [TIR] Migrate VTA TIR passes to the new pass manager. (#5397)
- [REFACTOR] Migrate all low-level passes to the Pass Manager. (#5233)
- [REFACTOR] Introduce ExprDeepEqual, Remove IRDeepCompare (#5206)
- [REFACTOR] RewriteForTensorCore -> te/schedule (#5379)
- [REFACTOR] Remove
ir_pass
in favor of analysis/transform. (#5415) - text format printer considering future parsing use #5483
- Remove buffer params from pass config. #5652
- std::string -> String Migration in TIR nodes #5596
- Remove
CallNode.call_type
in favor of attribute. #5937 - Remove legacy HoistIfThenElse #5944
- Improve Let/LetStmt support. #5949
- Refine side effect analysis. #5954
Provide->ProducerStore
,Realize->ProducerRealize
. #5750- Migrate the tvm/tir/expr.h to constructor #5773
- Migrate tir/stmt.h to use constructor. #5778
- Cleanup unused classes #5789
- Add tir prefix to type keys #5802
- Enhance VerifyGPUCode #6194
- Enforce buffer pointer var type to be consistent with dtype. #6317
- Create a StringImm reference type #4806
- Add init member to ReduceNode #6138
- Add dump and print for debugging (NFC) #5207
- Streamline Function Attr interface. #5045
alpha_equal
tostructural_equal
#5161- Remove AttrsEqual and AttrsHash related code #5169
- [NODE] General serialzation of leaf objects into bytes. (#5299)
- [POC] Initial stab at
std::string->String
upgrade (#5438) - [TIR] Make
lower_warp_memory
supportextent(threadIdx.x) < warp_size
(#5307) - [PASS] dtype rewrite for indexing variables (#5092)
- [PYTHON] Enhance
with_attr
API, cleanup MakeAPILegacy in testcases (#5335) - [PYTHON] Make IntImm more like an integer (#5232)
- [IR] Move to runtime::String (#5276)
- [IR] kExternalSymbol -> kGlobalSymbol (#5211)
- [IR] Remove PrimExpr from String (#5311)
- IRModule is updated with String #5523
- IR is updated with String #5547
- Streamline ir/op Registry #5609
- Migrate IRModule ObjectRef to not-null #5654
- Migrate BuildConfig to PassContext. #5668
- relay.op.Op -> tvm.ir.Op #5705
- Separate ArgTypeCode from DLDataTypeCode #5730
- Remove legacy
compute_expr.h
#5738 - Call::Halide => ProducerLoad, DSL/TIR decouple. #5743
Provide->ProducerStore
,Realize->ProducerRealize
. #5750- Migrate the tvm/tir/expr.h to constructor #5773
- Migrate tir/stmt.h to use constructor. #5778
- Migrate all Object construction to constructor. #5784
- Cleanup unused classes #5789
- Finish
std::string->String
updates #5793 - Add tir prefix to type keys #5802
- Change Call.name to Call.op(RelayExpr) #5863
- Range/IntSet API style consistency. #5953
- Separate ArgTypeCode from DLDataTypeCode #5730
- Migrate all Object construction to constructor. #5784
- Finish
std::string->String
updates #5793 - Unify StrMapNode and MapNode #5687
- Int8 GEMM performance enhancement using Cublas (#4550)
- Speedup TSIM with multi-threading (#4491)
- Support cudnn softmax (#5214)
- Add cuDNN grouped convolution support (#5319)
- Winograd support for Conv3D (#5186)
- Improve
get_valid_count
and nms performance for CUDA (#5339) - Optimizations of
global_ave_pool
for NHWC layout (#5450) - Optimization of Conv2d Winograd algorithm on Tensor #5485
- Some performance improvement to VM #5901
- Optimize x86
conv3d_ndhwc
using data packing approach. #4866 - Improve NHWC depthwise convolution for AArch64 #6095
- Improve quantized convolution performance for armv8 architectures #5754
- Adding benchmark log format doc (#4366)
- Add Ninja build system to installation docs (#4554)
- Doc/comment fixes (#4452, #4463, #4469, #4493, #4397, #4580, #4585, #4591)
- Fix doc after moving to unified IR #4835
- Introduction to module serialization #4564
- ConvertLayout - Call RemoveUnunsedFunctions. #4834
- Fix bugs that override
n_trials
#4842 - Update the vm doc #4868
- Refine the example description of
max/min/sum/tag_scope
#4974 - Fix vta tutorial #4809
- Introduce how to add hardware backend to FAQ #4898
- Update API docs to reflect the status after the refactor. #4907
- Fix sphinx warnings #4917
- Fix Sphinx Warnings (RST indent, cross-ref, and image scale) #4920
- Fix Sphinx Warning: the target found for cross-reference #4925
- Sphinx -- Introduce alias detection. #4954
- Fix Warnings from #4942 #4959
- Fix sphinx precheck #4967
- Move
git_howto
to rst, add Stage documents to te #5055 - Add doc for Relay op strategy #5078
- Update relay docs #5112
- Include a tarball of docs, add a security faq #5119
- Cleanup docs before rebuild #5127
- Minimize necessary doc change #5129
- Various sphinx related fix. #5168
- Point docs to the ASF site. #5178
- Use https link #5183
- Reduce artifcats generated by sphinx gallery #5208
- Refine the example description of
max/min/sum/tag_scope
#4974 - Description updated for pooling attributes #5091
- [DOCS] Migrate some markdowns to rst, fix sphinx3 warnings (#5416)
- [DOCS] Misc docs improvements (#5222)
- [DOCS] Bring relay docs to the top-level flat view (#5343)
- [DOCS] Reduce artifcats generated by sphinx gallery (#5208)
- [DOCS] Use https link (#5183)
- [DOCSTRING]missing function parameters updated (#5228)
- [DOCS] Migrate HLS documents from md to rst (#5419)
- [Tutorial, QNN] Add tutorial for loading quantized PyTorch model (#5321)
- [Docs] VTA install doc migration from md to rst (#5442)
- [Docs] compiler version in docs (#5281)
- Remove legacy
compute_expr.h
#5738 TVM_REGISTER_API
->TVM_REGISTER_GLOBAL
#4768
- Add bfloat16 typeflag support (#4525)
- MSVC / Windows fixes (#4455, #4569)
- Fix Makefile for
howto_deploy
(#4457) - Fix GCC 4.8 compact (#4461)
- Fix search path to build
libtvm_topi.so
(#4467) - Fix for
conv2d_transpose
CUDA compilation (#4472) - Fix for LLVM 10.0 codegen (#4480, #4515)
- Fix alter op layout when calling global var (#4454)
- Fix
float2half_rn
support for cuda compute capabilities < 53 (#4489) - Fix compile errors for OpenCL backends (#4492)
- Fix serialization precision loss (#4503)
- Fix hybrid script to support array of tensors (#4494)
- Fix annotation for multiply op (#4458)
- Fix Dockerfile for linter CI (#4506)
- Fix TF resize for dynamic size models (#4510)
- Fix
bias_add
gradient (#4516) - Fix tanH unit test function call (#4517)
- Fix extra reshape parameter for ONNX (#4524)
- Fix crash caused by empty TOPI config (#4520)
- Fix ONNX shape op type to use int64 (#4528)
- Fix crash in TSIM virtual memory driver (#4527)
- Replace deprecated python library in setup script (#4533)
- Fix NMS
max_output_size
loop (#4541) - Fix style in IR mutator and IR visitor (#4561)
- Fix compiler warning (#4559)
- Fix to get end to end inference on Chisel VTA (#4574)
- Fix LLVM build by adding missing intrinsics headers (#4575)
- Fix context creation in quantization (#4582)
- Fix NDArray SaveDLTensor signature (#4586)
- Fix dense pack schedule for x86 (#4539)
- Fix for broadcast tensor of scalar type (#4577)
- Datatype refactor (#4513, #4560)
- Add const qualifiers for NDArray container (#4590)
- Fix TF <= 1.12 compatibility (#4593)
- Fix for graph debug runtime (#4598)
- Disable copy constructor for external codegen (#4597)
- Make ADT tag signed (#4605)
- Added declare of aluBits for TensorAlu #4624
- Get around limitation of g++-4.8 #4626
- Bugfix StmtMutator IfThenElse #4609
- Remove unecessary rdynamic #4613
- Resolve constexpr related link error in debug mode #4641
- Asymmetric padding #4511
- Reduce data size of asymmetric padding testcase #4658
- Fix Base64OutStream portability issue #4668
- Fix
topi.nn.global_pool
layout="NHWC" #4656 - Also package core.rly #4679
- fskip of EliminateCommonSubexpr cannot always return false #4620
- Fix Python syntax error in
start_rpc_server_to_tracker.py
#4682 - os.path --> osp to match the import #4681
- GitHub actions/checkout@v1 --> v2 #4680
- Fix Python syntax error AGAIN in
start_rpc_server_to_tracker.py
#4685 - Use ==/!= to compare str, bytes, and int literals #4686
- Rename
start_rpc_server_to_tracker.py
tostart_rpc_server_to_tracker.sh
#4689 - GitHub Action lint Python code for syntax errors #4688
- Generate blob use LLVM directly #4657
- Reduce input size to fix oom #4653
- Fix RemoveUnusedFunctions pass #4700
- Link the math library by default #4713
- Update mainline version to 0.7.dev0 #4720
- Add SizeVar representing non-neg valued variable in a tensor shape #4684
- Fix the compile problem of
cpp_rpc
#4725 - JSON upgrader to upgrade serialized json. #4730
- Fallback schedule for Int8 depthwise. #4733
- Fix dense x86 schedule #4728
- Fix demo dockerfile build failed #4744
- Improve CUDA vectorizer #4736
- Add .asf.yaml for github info #4761
- Fix padding in pooling op #4738
- Remove
run_infer_type
duplicates #4766 - pooling.cc improvements #4767
- Export
builtin_fp16
on Windows #4731 - Fix Tensorflow conv3d pad bug, add non-cubic data and kernel tests #4772
- Bump prebuilt-image version in demo dockerfile #4770
- Update
tune_simple_template.py
#4778 - Explicitly link to cublasLt if it exists #4776
- Fix hasattr by extracting Python error type from Windows error message #4780
- Replace os.path.exists with try...except...else #4784
- Make sure to visit the arguments of inlined functions #4783
- Parse additional exception strings #4785
- Fix #4670: add bias for fc layer #4801
- Change color channel from BGR to RGB for darknet preprocessing #4794
- Fix -Wextra #4804
- Fix vta tutorial #4809
- Minor bug fixes in AutoTVM for QNN graphs #4797
- Fixed subprocess creation under windows #4820
- Improve tol to resolve flaky case #4836
- Fixed process termination routine in windows #4844
test_cuddn
flaky #4846- Mxnet parser for Qnn dialect #4714
- Enhance
cc.cross_compiler
#4817 - Fixed crash caused by reversing bitwise operations #4852
- Reverse some changes made for
intel_graphics/conv2d.py
in PR #4849 #4853 - const auto p -> const auto& p #4861
- Fix onnx import bugs #4750
- Explicit llvm::StringRef to std::string conversion #4859
- Update the runtime PackedFunc for module #4871
- Improve antlr import error message #4888
- Fix
alpha_equal
bug for attribute check #4897 - Fix issues in cuda codegen #4876
- Fixed: Bitwise ops on floats causing wrong code generation and crashes. #4892
- Fix
tvm.target.generic_func
runtime detection #4910 topi/tests/python/test_topi_sort.py::test_argsort
#4891- Use opencv reisze method for preprocessing of image in darknet #4883
- Fix build breaks with StringRef changes #4923
- Remove unnecessary spliting in the cached chunk #4935
- Fixing an Infinite Loop case in UnmatchedChecker. #4881
- Remove SGX toolchain installation from CI Dockerfile #4948
- Fix tedd tutorial after strategy change #4947
- Allow customize MKLDNN library location #4814
- Added CopyFromBytes and CopyToBytes convenience methods to NDArray. Fixed typos. #4970
- Fix gcn tutorial failure #4994
- Fix stride default value None in torch.nn.functional.avg_pool #4984
- Fix ROCm strategy for winograd conv selection #5001
- Fix
get_valid_count
flaky test for cuda #4901 - Change Scala Linter scalafmt => scalastyle #4998
- Kill from tvm import te #5007
- Chisel fixes and de10nano support #4986
- Fix gpu not found when running TVM docker #4975
- Fixes for pylint==2.4.4 #4849
- Fix unordered dictionary problem for python version under 3.6 #4982
- Fix gcn tutorial failure #4994
- Fix stride default value None in
torch.nn.functional.avg_pool
#4984 - Fix ROCm strategy for winograd conv selection #5001
- Early checking added and new test cases added for schedule fuse #5010
- Fixed div by zero core dump. Fixed rounding intrinsics on int crash #5026
- Test case modified for int type #5012
- Bug Fix for ARM CPUs. Lower strict assumption. #5063
- Triage the testcases to fit the new namespaces #5071
- Add colors to
compute_at
edges and thread/block indices. #5111 - Temporary fix to the stack overflow issue in autotvm task extraction #5019
- Fix compilation of If-Elses #5040
- Fix CompilerAttrs #5109
- Fix the existing test cases before refactoring. #5122
- Fixed bug where shifting by out-of-bounds value results in no compute code being emitted. #5115
- Fix for issue #4831. The
data_min_idx
anddata_max_idx
were flipped. #5136 - Duplicate likely nodes added when loop axis split unevenly #5084
- Fix incorrect name of calibration mode #5150
- Remove contrib spatial pack schedule of depthwise convolution #5148
- Fix annotate pass static variable #5023
- Fixed ConvTranspose2D parsing #5157
- Nullptr check #5176
- rocm: fix miopen convolutions #5179
- rocm: fix
dense_rocblas
in strategy, topi #5191 - Fix CRT static test bug (#5293)
- Fix perf regression of tir refactor (#5258)
- Bugfix in tensorflow
space_to_batch_nd
(#5175) - Compilation warnings fixed for 32bit and 64bit compilation (#5349)
- Fix hang in MergeCompilerRegions (#5227)
- Fixes to MergeCompilerRegions (#5195)
- Fix generation of LLVM intrinsics (#5282)
- Fix setting up hints for getaddrinfo (#2872)
- Add ConstantNode to IsAtomic (#5457)
- Fix String SEqual (#5275)
- Fix fuse over functions that are handled by external codegen (#5365)
- Fix memory leak when accessing NDArray (#5413)
- Remove the duplicate PrintIR pass in Relay (#5403)
- Fix
lower_warp_memory
(#5247) - Fix
lower_warp_memory
when there are >1 warp buffers (#5368) - Fix intel conv2d auto tune (#5200)
- Fix FuseBatchNorm output cast error if
need_cast
is True #4894 - Fix an assertion exposed by loop vectorizer #4916
- Fix error message #4945
- Fix for recursive let #5757
- Fix Calibration Pass to Support Modules with Multiple Functions #5768
- Fix what looks like bizzare copy-paste issue #6010
- Fix bug in
transpose_shape_func
#6180 - Fix bugs in CUDA codegen (#5209)
- Don’t remove() TemporaryFile in del. (#5414)
- Fix
test_ir_type
. (#5390) - Fix multiple identical inputs bug (#5389)
- Add cuda target check to dense tensorcore schedule. (#5376)
- T2 test fixups (#5391)
- Fix miopen padding (#5433)
- Misc fixes for ROCm (#5431)
- Fix copy constructor (#5237)
- Corrected TVM autotuning on GPU (#5432)
- Fix vector load (#5226)
- Minor bugfix in
message_passing.cc
(#5254) - Fix a bug when vectorized load&store was involved for… (#5428)
- Fix to skip node not in graph. (#5238)
- Fix #5388 [VULKAN] vkBuffer released before memory copy command se… (#5418)
- Fix a minor error in
device_annotation
(#5291) - Fix scalar’s ndim is 0 (#5344)
- Fix the runtime raise error #5586
- Fixed bug in attribute parsing for pool layers. #5582
- AutoTVM incorrect measurement #5511
- fix a min/max simplify bug #5761
- Rename
tvm_dso_op
tolibtvm_dso_op
#5714 - Fix generating types like float44 and float88 #5722
- Avoid downloading when
TOPHUB_LOCATION
is NONE #5720 - codegen llvm: move nvptx-specific intrinsic handling into
codegen_nvptx
#5726 - ROCm warp shuffles and reductions #5727
- fix small bug about
dense_grad
#5695 - Clarify downstream consistency of TVMArgTypeCode #5742
- Fix gelu in PyTorch frontend, tighten numerical checks #5763
- Make batch matrix multiplication on GPU tunable #5752
- update vulkan build rule #5777
- aten::norm support added #5776
- Edit onnx parser to infer values in post order #5755
- Support symbolic inputs of Fill #5762
- support
aten::type_as
in the pytorch frontend #5787 - Temporary disable fp16
type_as
test for PyTorch Frontend #5799 - Add config switch for nn.dense layer type. #5801
- Move cpu-only frontend tests to a CPU stage #5807
- Pin hand landmark network to version 0.7.4. #5813
- Limit number of threads in all jobs #5815
- Error msg update #5818
- fix relay.build to not change the module argument in place #5822
- Fix InferType when module contains Prelude #5797
- Add a combine
batch_matmul
pass #5791 - RepeatVector, Conv3DTranspose op support added #5833
- Fix converting serialized quantized models #5839
- ffi (Object): make class dict visible in instances #5843
- Additional canonicalization added for AddNode #5846
- Suppress the warning messages when compile engine selects impls #5821
- fix #5849 #5851
- Introduce POD-C Compliant tvm::Map #5740
- Add bfloat16 #5601
- Add Python Classes for all Attrs #5853
- Fix map assign issue in CI test #5854
- Introduce Target Id Registry #5838
- Update
has_dtype/has_shape
to pattern lang doc #5847 - Add
nn.batch_flatten
as quantizable. #5805 - Fail early before running invalid dynamic graphs #5856
- Improve type handling in PyTorch frontend #5834
- HotFix the python intrin rule #5895
- add a few gradients #5899
- Add Binary Intrinsic ops to TIR Ops in C++ #5900
- Allow implicit conversion in TVM FFI to tvm::Bool #5907
- PyTorch frontend: fix handling of duplicate use of a model weight #5897
- Don’t multiply by constant 1 uselessly in dense #5911
- Support any index matching for TupleGetItem #5909
- Add MicroTVM tutorial using the STM32F746 discovery board #5655
- Fix serialization of inf float value #5912
- Fix CPU Thread Binding for Multiple Sockets #5918
- CUDA device API & VerifyGPUCode pass update #5898
- Update install.rst #5858
- Two small fixes to AMDCPU codegen for LLVM 10+ and ROCm 3.5+ #5920
- Add LegalizeInvalidAttach to legalize the
compute_at
location after split or fuse #591 - Don’t rewrite expressions used outside of the pattern #5930
- Add TupleGetItem to CSE #5931
- Various update for CoreML codegen #5934
- Update date in the NOTICE #5943
- Raise right error in tensorflow split op #5951
- Add rm xla attributes in tf docs #5950
- Fix OpenCL
get_valid_counts
errors due to intrinsicatomic_add
#5857 - Amendments for gradients #5941
- Fix the meaning of
conv{1,2}d_transpose
output_padding
parameter. #5758 - Make first order gradient graphs more efficient #5959
- Raise an exception when extern function does not return Stmt #5964
- Improve docker/bash.sh to handle git worktrees #5970
- Install DNNL (OneDNN) to CI Environment #5936
- Add Dynamic reshape to a dynamic namespace and add DynamicToStatic Pass #5826
- Add meshgrid op in Relay, TOPI, Pytorch frontend #5961
- Print right number of parentheses for LoadNode #5965
- Migrate data structure of TargetNode #5960
- Remove redundant function CreateBufferVecPtr #5982
- Fix string argument mismatch in GraphRuntimeCodegen #5933
- VectorType::get with two parameters is deprecated in LLVM 11+ #5984
- Fix Compilation Error in CRT #5713
- Fix runtime::String backward compatibility in JSON #5725
- Allow RPCWrappedFunc to rewrite runtime::String as std::string #5796
- Fix reshape #5739
- Fix building with LLVM-10 on macOS #5859
- Add cuda 11 to
contrib.nvcc.find_libdevice_path()
#5902 - Fix sequential cpp test #5745
- Infer types in MergeComposite #5766
- Fix recursive let for well formed check #5780
- Recover global state after
test_util.py
#5824 - Fix bug in rpc ring buffer shrink #5516
- Fix remote device sync #5538
- Fix bug in rpc ring buffer shrink (#5516) #5537
- RPC Server error fix on Pynq FPGA #5607
- Fix FloorMod Simplifier #5509
- Fix Python debugger segfaults with TVM built with LLVM #5685
- Fix Compilation Error in CRT #5713
- Fix runtime::String backward compatibility in JSON #5725
- Allow RPCWrappedFunc to rewrite runtime::String as std::string #5796
- Fix reshape #5739
- Make "none" DataType explicit #5491
- Change "scalar" and "stack" in IDL from "inrout" to "in" #5487
- Link necessary libraries when building runtime for Android #5496
- Fixes for wasm32 target #5489
- Reset target and wait for runtime initialization on connect. #5499
- Bump tophub rocm version #5504
- Improve commentary for RingBuffer #5518
- Add unit tests for ONNX PRelu and fix importer to pass them. #5521
- LRN only supports 4D tensors, remove it from
alter_op_layout
#5520 - Fix an issue with ONNX Upsample #5530
- Cache PrimExpr instead of raw pointers in bound analyzer #5533
- fix a few bugs with shape inference and types in the ONNX importer #5534
- Add Onnx Pad v11 #5539
- Changes to
cpp_rpc
to make it work on Android (+ Hexagon offloading) #5535 - Fix to reduce RAM size during loading model #5507
- Fix MakeLoopNest for warp memory #5382
- Load platform specific lib for tvmdsoop instead of the hard-coded tvm_dso_op.so #5542
- Add tests for running micro on native arm hardware #5546
- Apparently, ONNX Conv with no 'pads' defaults to zero padding #5548
- clang-format the h,cc,m files. #5557
- Fix conv2d alter op for arm cpu #5532
- Fix topi test for non tensorcore CI. #5563
- Add clang-format and nodejs to ci-lint #5567
- Enable clang-format. #5572
- Allow
ubuntu_install_darknet.sh
to work in both 18.04 and 16.04 #5574 - Add a quantized conv2 unit test for the tflite front-end #5558
- Fix JSON graph dumping. #5591
- Warp level reduction support for CUDA #5498
- One more fix for concurrency count #5589
- Improve robustness of the docs build #5583
- Phase out WebGL #5570
- Fix vulkansdk in the ci-gpu and upgrade to 1.2.135 #5566
- Update ci-cpu to bionic #5554
- Overestimate binary size for microTVM compiled binaries. #5590
- Fix bug and re-enable RPC execution test #5436
- Add ostream formatters for TargetPtr/TargetVal. #5592
- Fix cross thread reduction #5551
- Fix TVMArray layout on device #5599
- Add debug mode to tempdir() #5581
- Represent alignment information in LLVM IR #5598
- Fix codegen for warp shuffle intrinsics #5606
- Fix Topological Order calculation for DFPattern Language #5612
- Global MaxPool3d and AvgPool3d support #5098
- Fix build error of iOS RPC #5621
- isn't a CallNode sometimes #5623
- Introduce config to PassContext. #5631
- CMAKE fix #5630
- Label Pattern Partitions #5627
- Extend AttrPattern to support CallNode and FunctionNode attributes #5637
- Increase bss section size. #5660
- Add buffer name when creating tensor bindings #5670
- µtvm debug improvements #5648
- enable
amd_apu
device on vulkan target #5659 - Support TupleWrapper as direct ancestor of control flow ops #5639
- add tvm.micro pydoc to sphinx #5661
- Add a regression testcase for #5674 #5677
- Fix C++ RPC build problem on Linux #5671
- Add a check Callback to the Pattern Paritioner #5646
- Call previous excepthook in
tvm_excepthook
. #5675 - Fix the shift column for
scale_shift_nchw
andscale_shift_nhwc
in C topi #5679 - Support more dtypes for TVMDSOOp #5694
- In
memory_plan
, check if value is not None, instead of just checking value as boolean. #5700 - Fix flaky
test_topi_pooling.py:test_adaptive_pool
#5736 - Fix the values for
test_fmod
since it fails way too often otherwise #5723 - fix small bug about
dense_grad
#5695 - Fix sequential cpp test #5745
- Add Scatter to Topi/Relay/ONNX via hybrid script #5619
- Clean WASM environment before build #5759
- Fix gelu in PyTorch frontend, tighten numerical checks #5763
- fix #5686: remove a overstrict assert in MakeAllreduce (#5686) #5785
- Improve Pattern Language Docs #5676
- Add missing expr visitor for any #6082
- Remove the tvm web from version update #6122
- Clear relay cache after every build & Clear warning message cache after autotvm task extraction #6131
- avoid unexpected throw in AttrInitEntry #6128
- Verify that tensor reshape is valid. #6215
- Use LocalRunner by default in the tutorial tune_relay_cuda.py #6001
- Undefined names: import os for line 324 & import re for line 308 #6003
- GitHub Actions upgrade to actions/setup-python@v2 #6002
- Only pass pythonpath for ci images #6005
- Auto-convert shuffle with single index to “extract element” #6006
- Cache object refs in loop partitioner instead of object pointers #6004
- Fix
test_arith_solve_linear_inequality.py::test_multi_equal
#6014 - MXNet frontend support for AMP cast op #5976
- Demo showing how to run a pruned model. #5975
- Move compiler related registry items to
vta/build_module.py
#6012 - Pin keras version #6032
- Fix in
arm_cpu/conv2d_alter_op
for NHWC quantized #6027 - Add creation of Hexagon device in RPC client #6035
- Terminate basic block after “ret” instruction #6036
- µTVM CRT modifications for on-device RPC server #5921
- Create TBAA information based on the unrelying buffer type #6046
- Add support for tflite
arg_min
andarg_max
#5992 - Fix
fully_connected
converter when batch size is not 1 #6038 - Fix a primitive check error #5991
- Refactor to expose MakeOp functions to C++ #6047
- Fix
conv2_gemm
after target structure update #6037 - Remove use of designated initializers from
hexagon_module.cc
#6055 - Build crttest and cpptest separately. #6057
- Fix pytorch frontend prim::Constant issue #6051
- update frontend tutorials to new model based runtime interface #6063
- Remove unnecessary std::cout #6072
- Fix error message in Buffer::vstore, NFC #6056
- Fix FSIM Compile Error. #6070
- Improve vector simplification for float operands #6043
- Fix LocalBuilder on macOS with python 3.8. #6083
- Add missing test for fast erf #6058
- Fixed point multiplication improvements for AArch64 #5980
- Fix code generation bugs for C/CUDA & Improve VerifyGPUCode pass #6041
- Delete declaration of unused
op_node
#6102 - Load configs even it has no entity #6100
- Update SGX example Cargo.toml #6067
- Add default value for option
USE_DNNL_CODEGEN
in the cmake #6099 - Update installation doc with minor improvements #6104
- lint: add opencl .cl file type #6092
- Clean up conversions between TVM and Rust functions #6114
- Improve reduction schedule on arm CPUs #6110
- Register Shape Func for Some Operators to Handle Dynamic Shapes #5955
- Fix variable name conflict with OpenCL keyword #6048
- Some rust cleanups #6116
- Option to specify alternate directory to output build to #6016
- Add
get_num_inputs
to GraphRuntime #6118 - TFLite quantized conv test #6084
- Fix autotvm on the
conv2d_nchw_winograd.mali
operator #6130 - add attr option mfloat-abi for arm32 #6123
- Fix CUDA Library Tuning #6132
- Add missing RPC sources after refactor #6113
- Correct
runtime.load_module
#6161 - Improve error messages in graph tuner, graph runtime, and module loader. #6148
- Fix some shape mismatches between TF and Relay #6166
- Improve doc string #6176
- Fix incorrect function signature in header #6172
- Fix alignment of note #6181
- Implemented PADV2 Operator for TFLite and added support for constant values in PAD. #6167
- Unary ops support added in frontend #6196
- Change the meaning of
conv3d_transpose
output_padding
to matchconv{1,2}d_transpose
#6065 - Fix compile warnings. #6204
- Fix -mfloat-abi=soft compilation for ARM with OpenCL target #6150
- Match pytorch 1.6 googlenet pretrained model (#6201) #6212
- Mod operator, bug fix #6160
- RESHAPE with dynamic shape arg in TFLite frontend #6208
- Fix compilation error with cuda 11 #6213
- Fix
port_end
wrong default value 9199 to 9099 for keeping same with source code #6220 - Std op without specified dimensions support #6226
- fix crt building and running error #6231
- Implemented
ONE_HOT
Operator for TFLite. #6223) - Avoid unexpected throw in AttrInitEntry #6128
- Added casting to hybrid script doc and fixed pass infra doc #6174
- Fix compile warnings. #6204
- Fix -mfloat-abi=soft compilation for ARM with OpenCL target #6150
- Mod operator, bug fix #6160
- Fix compilation error with cuda 11 #6213
- Fix
port_end
wrong default value 9199 to 9099 for keeping same with source code #6220 - Std op without specified dimensions support #6226
- Verify that tensor reshape is valid. #6215
- Fix crt building and running error #6231
- Fix
conv2d_transpose
output padding #6236 - Fix cuda half math function is undefined: hpow, htanh #6225
- Fix division range estimation error in simplifier #6244
- Fix newer GCC compiler warnings. #6257
- Support
_contrib_SyncBatchNorm
#6245 - Fix reduction #6250
- Add apt repository for clang-11 and llvm-11 #6256
- Update tutorial to new TARGET as
micro_dev
is no more #6262 - Fix clang-format #6264
- Trivial fix, up the rodata section for the discovery board to 512 bytes. #6259
- Fix cuda half math function is undefined: hpow, htanh #6253
- Add dilation in x86 NCHWc depthwise conv support #6267
- Decrease test times by introducing testing model #6235
- Add support for parsing the any dimension. #6277
- Improve error messages for memory verifier and gpu memory verifier #6281
- Reflect Compile-Time CMake Options into libtvm.so #6280
- Add cmake options into libinfo #6286
- Update slice to infer attributes when not graph inputs #6276
- Use rpc.LocalSession for simple tests #6294
- Fix random fail #6312
- Fix resize test #6298
- Fix cython FFI compact with np.int64 #6321
- Fix relay vm optimize #6322
- Changed TVMCTVMContext to TVMContext #6306
- Make able to compile with MSVC #6341
- ROCm changed name of library and removed the old one in ROCm 3.7 release. #6345
- Compatible for ROCm before 3.7 #6359
- Use clear name that is separate from ASF brand for cache #6360
- Fix
Dockerfile.demo_android
#6361 - Fx sparse dense schedule on cuda #5803
- Fix strategy for sparse dense cuda #5782
- Fix x86 conv2d template when tuning with unpacked layout #5938
- Fix the filter width parameter in
depthwise_conv2d
#6081 - Fix reshape usage in ARM schedule #5732
- Missing header #4865
- Fix
conv2d_transpose
output padding #6236 - Simplify reduce expression in te.gradient #6611
tvm.module
->tvm.runtime.module
tvm.module.load
->tvm.runtime.load_module
tvm.module.enabled
->tvm.runtime.enabled
tvm.module.system_lib
->tvm.runtime.system_lib
tvm.relay.Module
->tvm.IRModule
tvm.create_schedule
->tvm.te.create_schedule
tvm.placeholder
->tvm.te.placeholder
tvm.compute
->tvm.te.compute
- Deprecate NNVM (#4535, #4562, #4565, #4571)
- Deprecate FreeStmt #5890
- Remove legacy
compute_expr.h
#5738 - Deprecate OpenGL #5711, #5712
Relay is a functional, differentiable programming language designed to be an expressive intermediate representation for machine learning systems. Relay supports algebraic data types, closures, control flow, and recursion, allowing it to directly represent more complex models than computation graph-based IRs (e.g., NNVM) can. In TVM v0.6, Relay is in stable phase and is ready for production.
- Algebraic Data Types (ADT) support (#2442, #2575). ADT provides an expressive, efficient, and safe way to realize recursive computation (e.g., RNN). Refer to https://tvm.apache.org/docs/langref/relay_adt.html for more information.
- Pass manager for Relay (#2546, #3226, #3234, #3191)
- Most frameworks have been supported in Relay, including ONNX, Keras, Tensorflow, Caffe2, CoreML, NNVMv1, MXNet (#2246).
- Explicitly manifest memory and tensor allocations in Relay. (#3560)
The Relay Virtual Machine (Relay VM) is the new generation of runtime to strike a balance between performance and flexibility when deploying and executing Relay programs. Previously, the graph runtime is able to utilize the fully static nature of the input graphs to perform aggressive optimization such as fully static allocation, and optimal memory reuse. When we introduce models which make use of control-flow, recursion, dynamic shapes, dynamic allocation we must change how execution works.
Relay VM is now usable and is able to achieve decent performance for a various of models and targets.
- Design (#2810 #2915) and a first version of implementation (#2889),
- Add VM runtime for Relay and compiler support (#3120, #3121, #2889, #3139)
- Relay VM (pattern matching #3470, port to python #3391, serialization #3647)
- Relay VM Profiler (#3727)
- Support execution on devices for Relay VM (#3678)
- [Relay][VM] Add more passes to VMCompiler (#4058)
- [relay][vm] Separate VM runtime with executable (#4100)
- Port VM, VM compiler, and Object into Python (#3391)
- VM: Add AllocTensor instruction and better instruction printer (#3306)
- [Relay][VM][Interpreter] Enable first-class constructors in VM and interpreter via eta expansion. (#4218)
- [Relay][VM] Clean up the VM and VM profiler code (#4391)
Relay is designed to natively support first-order and higher-order differentiation. The automatic differentiation infrastructure is now usable and a count of operators with gradient support are available in v0.6 release.
- Higher order reverse mode automatic differentiation that work with control flow (#2496)
- Higher order continuation passing style (#3456, #3485 )
- Relay gradient registration (clip #3509,
max_pool2d
andavg_pool2d
#3601) - Relay AD algorithm (#3585)
- Relay Training - allow gradient to return a tuple (#3600), numerical gradient check (#3630)
- Improve AD for concatenate (#3729)
- [Relay][Training] Add missing gradient check to gradient pass (#4169)
- As a part of Relay's automatic differentiation system, we are adding primal gradients for Relay operators. Please refer to #2562 for tracking the progress.
- Gradient for Conv2d (#3636)
- Add gradient operators (#3857, #3894, #3901, #3915)
- Add gradient for log-softmax (#4069)
- [Relay][Training] Add gradient for Crossentropy (#3925)
- [Relay][Training] Add and fix gradients (#4126)
Low-bit inference is getting more and more popular as it benefits both the performance and storage usage. TVM now supports two types of quantization. 1. Automatic quantizaion takes floating-point precision model, does per-layer calibration and generates low-bit model. 2. TVM also imports pre-quantized model from Tensorflow and MXNet, a new dialect QNN is introduced to handle further lowering to normal operators.
- Automatic Quantization
- Low-bit automatic quantization supported. (#2116). The workflow includes annotation, calibration and transformation.
- Refactor quantization codebase and fix model accuracy. (#3543)
- KL-divergence-based per-layer calibration. (#3538)
- Add option to select which convolution layers are quantized. (#3173)
- [Relay][Quantize] Integrate data-aware calibration into quantization. (#4295)
- Pre-quantized model support (QNN operators and legalize pass).
- Add a legalize pass to Relay (#3672)
- Qnn Concatenate, quantize, dequantize and requantize operators (#3819, #3730, #3745, #3531)
- QNNtoRelay & QNNLegalize Pass utility (#3838, #3782)
- Requantize: Optimize lowering for some corner cases. (#3864)
- New quantized operator support: conv2d, add, dense (#3580, #3736, #3896, #3910)
- Do type checking for the input and kernel in the qnn conv2d (#3904)
- Legalize and AlterOpLayout for Intel int8. (#3961)
- Renaming tests to follow the Relay nomenclature. (#3975)
- Fix padding changes due to #3739 (#3989)
- Memorizing quantize node mapping to avoid duplicated simulated quantization (#3233)
- Infrastructure to support pre-quantized models (QNN) (#3971).
- [Relay][AlterOp] NHWC to NCHWc support for Pool, concatenate, sum. (#4059)
- [TOPI][x86] Cascade lake support. (#4123)
- [TOPI][x86] Legalize - Support int8xint8 convolution to use VNNI inst (#4196)
- Qnn dequantize with min max using Mxnet flavor to support Mxnet prequantized models. (#3945)
- Improve the lowering of Qnn Dense (#4213)
- Adding support for dequantizing from int32 to float32. (#4130)
- [QNN] Refactor fixed point multiplication in requantize (#4073)
- [Relay][Quantize] Use fixed point mulplications (#4160)
- Add support for quantized multiply to Relay (#4141)
- Use legalize to handle NHWC layout for
arm_cpu
(#3754) - [QNN][Legalize] Specialize for Platforms w/o fast Int8 support (#4307)
- [QNN] Use Int16 upcast in Fallback Conv2D. (#4329)
- Retain input kernel scales in QNN dialect (#4292)
- [QNN] Lowering for Depthwise Convolution. (#4351)
- [QNN][TFLite] Parsing QNN Add op. Adding MobilenetV2. (#4142)
- [QNN][TFLite] Parsing TFLite quantized models. (#3900)
- Added tflite frontend support for quantized mean. (#4339)
- [Relay][Legalize] Legalize
conv2d_transpose
for NHWC (#4399)
TSIM is introduced to improve software and hardware integration and simulation accuracy. It integrates the hardware development process into the software stack. TSIM enables VTA to provide a more accurate performance feedback, i.e. clock cycles, compared to the traditional functional model of a hardware accelerator. Moreover, Chisel implementation for VTA is availale and it runs on top of TSIM.
There has been a proliferation of resource-constrained and embedded devices that do not have operating systems or a mature software stack. MicroTVM is intended to support TVM on such bare-metal devices.
- [TSIM] Enabling Cycle-Accurate Hardware Simulation for VTA (#3010, #3206, #3242)
- Chisel implementation for VTA and runs on top of TSIM (#3258, #3347)
- MicroTVM (#3227)
- Relay Compilation + AutoTVM compatible operator libraries for VTA (#3135)
- ChangeBatch pass for batched VTA compilation (#3656, #3660)
- VTA fast simulator statistics (#3481)
- TSIM improvements and fixes (#3505)
- Chisel VTA enhancements and fixes (32bit support #3558, alu instruction generation #3592, coherence support #3593, separate types #3605, tensor issue/commit #3637, uop load request #3643, uop dma requests #3654)
- VTA Runtime refactor for non-shared memory FPGAs (#3590)
- VTA HLS codebase refactor for Ultra96 (#3496)
- VTA support for batched inference (#3661)
- VTA bitstream compilation for Intel FPGA (#3494)
- TSIM: Introduce Virtual Memory for TSIM Driver (#3686)
- Parallel TSIM hardware compilation with macOS and debug support (#3797)
- Chisel: scale dram base address in hardware instead of runtime (#3772)
- Chisel: run all unittests by default (#3766)
- Chisel: improved Data Gen, Added ALU Test (#3743)
- Chisel dependencies for TSIM CI (#3721)
- Chisel: Added Module Unit Test Infrastructure (#3698)
- Add ISA BitPat generation (#3891)
- de10-nano driver (#3394)
- Extending Vision model coverage compilation for VTA (#3740)
- Conv2d transpose (deconvolution) operator support (#3777)
- Support TLPP in function simulator. (#3555)
- [VTA][Chisel] TSIM VTA Source Refactor (#4163)
- [VTA][TSIM] Serial GEMM Application Added (#4082)
Rust language support in TVM includes two parts. 1. The frontend wraps the current C API and exposes a Rust programming model. 2. The backend serves as an alternative to C++ runtime. It privdes a standalone WASM module and security support, e.g., SGX.
- Rust frontend (#2292).
- Unify types between bindings and pure Rust impl (#2616)
- Rust: load syslib modules at compile time (#3274)
- Rustify PackedFunc & Friends (#2969)
- Rust DSO module (#2976)
- A special operator
annotation.stop_fusion
to prevent it being fused with previous expressions (#2624). batch_matmul
supported (#2561).reverse_reshape
supported (#2503).- Faster-RCNN proposal operator for CUDA (#2420).
- Vision operator for YOLO
yolo_reorg
(#1941). slice
operator for MXNet (#2662).arange
supported (#2621).- Vision operator
roi_align
(#2618). where
operator for MXNet (#2647).- Deformable conv2d (#2908)
- Faster-RCNN Proposal OP (#2725)
- ROI Pool operator (#2811)
- Gluoncv SSD support on CPU (#2353)
- shape, reverse, and sign op (#2749, #2800, #2775)
- tile and repeat op (#2720)
- logical operators (#2743, #2453)
- stack op (#2729)
- NCHWc upsampling (#2806)
- clip and wrap mode support in take (#2858)
- AlterLayout support for
intel_graphics
conv2d , depthwise conv2d (#2729, #2806) - Add foldr1 operator (#2928)
- Add rsqrt operator (#2949)
- Add clip and wrap mode support in take (#2858)
Gather_nd
exposed to relay (#2945)bitserial_conv2d
move to autotvm template and updates (#2819)- Port x86 NCHWc to AutoTVM for Task Extraction (#2664)
- Implement relay
nn.bias_add
compute in C++ (#3027) - Rename output tensors for better readability (#3006)
- int8 dense on CUDA & Dense op quantization (#2877)
- Bitserial dense operators for CPU (#3051)
- Enhance upsample operator to adapt onnx opset v9 (#2968)
- Add adaptive pooling operator (#3085)
- Add all operator (#3124)
- Add cblas
batch_matmul
(#3210) - Add packing for int8 1x1 convolution and support the int8 group convolution on X86 (#2991)
- Add op size (#3094)
- x86 TOPI (
roi_align
#3475,conv2d_transpose
#3491) - Intel INT8 (dilation in conv2d #3510, type checking #3516)
- Reinterpretation of tensor elements (#3599)
- Spase-Dense for block-sparse multiplication (#3566)
- Winograd matrix computation (#3553)
- CUDA schedule for
pool_grad
(#3622),group_conv2d
(#3663) - Bitserial operations conv2d, dense and bitpack (#3844)
- Improve numeric gradient check (#3856)
- Resize rework (3788)
- Improve
conv2d_transpose
CUDA schedule template (#3796) - SpaceToDepth and MirrorPad Operators (#3718)
- Add variance and layer norm op (#3700)
- Add
sparse_transpose
for Square CSR matrices (#3707) - TOPI: Memoize winograd matrix (#3687)
- New TOPI operators:
erf
,logical_and
,logical_or
,logical_not
,isnan
(#3702, #3929, #3979) - Improve
ceil_divide
in tile/split (#3842) - [Relay][Frontend][TF] Add tensor array ops (#3798, #4309)
- [TF][Op] Op where (#4045)
- [TOPI]Add op argwhere (#3994)
- [Relay]
crossentropy_with_logits
and its gradient (#4075) - [Relay][Op] Enhance Upsample Operator to support float scales (#4206)
- [Relay][Op] Add instance norm op (#4004)
- Frontend darknet (#2773)
- Support tf.gather (#2935)
- Support tf.where (#2936)
- Adding ADD operator to tflite frontend for compiling the MobileNetV2 (#2919)
- Support SpaceToBatchND/BatchToSpaceND in Tensorflow frontend (#2943)
- Simplify TF
get_output_names
(#3025) - TF Tile Round Sign Pow Exp Reverse (#2960)
- Gluncv SSD support on the GPU (#2784)
- Allow an op as loop var in Tensorflow (#3056)
- Add
FULLY_CONNECTED
op into tflite frontend (#3019) - Add MXNet converter for RNN layer ops (#3125)
- Add log op in tf frontend (#3111)
- Add SoftPlus Sqrt in Tensorflow frontend (#3187)
- Add onnx elemwise greater/less (#3186)
- Add PlaceholderWithDefault (limited) implementation in TensorFlow (#3184)
- Support
tf.math.reduce_prod
(#3166) - Better shape inference in TensorFlow Frontend (#3176)
- Get list of unsupported ONNX operators (#2995)
- Implement ONNX MaxPool-v8 and MaxPool-v10 (#3114)
- Convert TFLite NCHW to NHWC (#3141)
- Add Crop op converter (#3241)
- TFLite frontend operator support: PAD, RESIZE, MUL, Reduce (min, max, mean, prod), LOGISTIC, elemwise operators (Sub, Divide, Power, Max, Min) (#3310, #3370, #3304, #3421, #3313, #3357)
- Tensorflow frontend operator support: Abs, FloorDiv, GatherND, LeftShift, LogSoftmax, Max, Min, Mod, RightShift, ZerosLike, TruncateMod, Neg, ClipByValue, ResizeNearestNeighbor (#3270, #3211, #3393)
- TFLite: Add
fused_activation_function
for ADD, SUB, MUL, DIV (#3372) - Support bidirectional RNN layer for MXNet (#3397)
- TFLite operator support (pack #3521, split #3520 )
- Keras operator support (permute, softmax #3618)
- TF operator support (BatchMatMul #3634)
- TFLite frontend operator support: tile, transpose (#3814, #3705)
- ONNX frontend operator support: PReLU for NNVM, Not, Sign, Equal (#3813, #3836, #3760)
- Keras frontend operator support: Dot (#3668)
- Add more cases to Keras
_convert_reshape
(#3846) - TensorFlow frontend operator support: OneHot, log1p, cos, sin (#3781, #3614)
- Support BatchMatMul with input dimensions larger than 3 for TensorFlow (#3732)
- ONNX new operator support: And, Tile, Erf (#3878, #3941, #3988)
- MXNet new operator support: pad, conv1d, deconv1d (#3739)
- TFLite new operator support:
batch_to_space_nd
,space_to_batch_nd
, tanh, greater, relu (#3850, #3996, #3963, #4022) - TFLite: Support depthwise convolution multiplier greater than 1 (#3922)
- Keras: Fix ReLU in Keras Converter missed the case (#3917)
- Keras: frontend upsample and 1 channel conv2d fixes (#3937)
- Tensorflow: Convert scalar Const into tvm.relay.const (#3885)
- TensorFlow: Add support for SquaredDifference (#3930)
- [relay][frontend] clean up tf frontend (#3710)
- [Relay][Topi][TensorFlow][ONNX][Lang] Add support for Any op (#4205)
- [Relay][Frontend][ONNX] Add support for op Where (#4184)
- [Relay][TopHub] Add switch to disable TopHub download (#4015)
- Add parser support for CAST tflite operator (#4096)
- Add parses support for
zeros_like
tflite operator (#4042) - Add parser support for SUM tflite operator (#4182)
- Add support for tf.assert (as no-op) and
tf.no_op
to TF Relay frontend. (#4172) - [Relay][Frontend][ONNX] New Operators and Opsets to Support BERT (#4197)
- [Relay][Params] Add APIs for storing and retrieving parameters from individual functions. (#4194)
- Add
build_create_shared_func
to tvm/contrib/cc.py (#3840) - Tensorflow saved model for NNVM (#2493 and Relay (#2586).
- Introduced
HybridModule
(#2477) so that normal TVM schedule can be compiled to hybrid target, run and dumped to Hybrid Script. - Relay ][Frontend][Tensorflow] add operator
add_n
(#4181) - [Relay][Frontend][Tensorflow] StopGradient (#4238)
- [Relay][Frontend][ONNX] Add support for broadcasting to Where and MatMul (#4267)
- [TFLite] Support PRelu (#4298)
- [Frontend][MxNet] support mxnet cond op (#4311)
- Add support for
quant.mul
operator in tflite frontend (#4283) - [Relay][Frontend][ONNX] operator support: DepthToSpace, SpaceToDepth (#4271)
- [Relay][Frontend][Tensorflow]Add
conv2d_transpose
. (#4300) - [Frontend]Add TensorFlow FloorMod (#4308)
- Make external library extend TVM's NDArray more easily (#2613).
- Improvements for NNPACK integratation, includes ci test, winograd (#2846, #2868, #2856, #2721)
- Improvements for OpenCL runtime (#2741, #2737)
- GraphRuntime: Enable sharing parameters of a model among multiple threads (#3384)
- Android runtime argsort support (#3472)
- GraphRuntime enhancements (
set_input_zero_copy
#3416) - A new minimal runtime implementation (~12kb .text on ARMv7/x86) for TVM.
- Add AVX512VNNI support for TVM (#3388)
- Enable miopen Group Convolution (#3987)
- Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)
- [RUNTIME] Separate runtime related contrib into runtime/contrib (#4207)
- [topi] add ARM v8.2 udot (uint8) support (#3978)
- [codegen] Add multiple operands and function support when using fp16 compilation (#4056)
- [TOPI] Added support for Mali Bifrost target (#4047)
- [topi] enable fp16 sort for arm (#4084)
- Add OpenOCD Low-Level Device (RISC-V Support) (#3756)
- Add wave 32 bc for AMD ROCm backend (#3984)
- [RUNTIME] Support C++ RPC (#4281)
- [TOPI][OP] Support Faster-RCNN Proposal OP on CPU (#4297)
- [TVM][RUNTIME] A minimum example to generate external library wrappers for DSOModule (#4280)
- Support custom datatypes (#2900)
- Add the acc16 intrinsic support (#3081)
- Handle float16 constants & fix BatchNorm (#3260)
- Structural hash - incorporate the var type into its hash (#3267)
- Relay C++ Build Module (#3082, #3144, #3174)
- Enable decorating python class to be a Relay Pass (#3364)
- Make Partial Eval support interprocedural optimization and termination check. (#3033)
- Introduce feature manager to Relay. (#3236)
- Use Relay parser to define the Relay prelude (#3043)
- Mechanism to detect incomplete expression match in Relay (#3203)
- EQ/NE operators support for StringImm expressions (#3283)
- Mechanism to detect incomplete expression match in Relay (#3203)
- Introduce CanonicalizeCast pass to formally reduce memory overhead introduced by fused cast operations (#3280)
- Support overloading comparison operations in Relay (#3168)
- Mac count: provide a pass to calculate the number of multiply-accumulate operations in a network (#2609).
- support for
conv_2d_transpose
(#3469) - [Relay][Pass] Count MAC for BatchMatMul (#4157)
- Detect depthwise conv2d in
mac_count
pass (#3083)
- support for
- Add Tuple pattern (#3596)
- Text format support for ADTs and prelude (#3863, #3939)
- Add new IR pass CombineParallelDense (#3862)
- Add support for
EQ
op in the deduce bound and the loop partition (#3775) - Introduce base-class IRMutatorWithAnalyzer (#3969)
- Define more standard global functions in the prelude of relay program, includes foldr1, hd, tl, nth, list update (#2928, #2917, #2771, #2866)
- Add SkipVectorize pass (#3222, #3228)
- [Relay][Pass] Add pass to remove unused functions in relay module (#4334)
- Add shape function for symbolic shape. It enables certain cases for broadcast with symbolic shapes. (#3606)
- [tvm][any] broadcast with values other than one (#3967)
- Symbolic shape support (broadcast op #3389)
- Support reshape for dynamic shape in tf converter (#4185)
- Runtime Shape Functions (#4179)
- An optimization pass to eliminate expressions which have the same functionality and same inputs (#2639).
- Refactor text printer to add stream-like API and FunctionType support (#2605, #2882)
- Build a scaffold for structured error handling (#2838). The new mechanism detects and rewrites error messages so that c++ and python stack trace are unified and not redundant. Guideslines and conventions for error handling is also discussed.
- Higher order reverse mode automatic differentiation that work with control flow (#2496)
- Integer arithmetic analyzers, includes modular set analysis, const integer bound analysis and rewrite simplifier (#2904, #2851, #2768, #2722, #2668, #2860)
- Improve operator fusion for TupleGetItem in relay (#2914, #2929
- Compute FLOP of autotvm template for int8 models (#2776)
- Common subexpression elimination pass in Relay (#2639)
- Improve quantization in Relay (#2723)
- Refactor
build_func
in measure module of autotvm to better support cross compiler (#2927) - Quantize all fields of concatenate (#2913)
- Remove stale verilog generator (#2964)
- Improve Relay printing (#2984, #2881, #3030, #3041)
- Add
min_num_branches
option in CombineParallelConv2D (#2961) - Add
expr_visitor
, fixexpr_functor
exponential blowup problem (#2988) - Support Deriving channels when it is not provided in AlterLayout. (#2972)
- Enhance BoundDeduce algorithm (#2795)
- Enhance loop partition algorithm (#2956)
- Better tuple fusion implementation (#3092)
- Enhance fusion rule that starts from elemwise and broadcast (#2932)
- Remove
on_device
op after annotation in heterogeneous pass (#3204) - Improve canonical and rewrite simplifier (#3132, #3149)
- Capture constant external python variables in hybrid script (#3157)
- Remove Peano nats from the prelude (#3045)
- Macro to define NodeRef methods, constructor style example (#3224)
- Consistent RAII scoping API (#3231)
- Register all operators' attributes in Python (#3175)
- Add module supoort in relay.build (#3424)
- Relay pass infrastructure improvement (#3319, #3336, #3430, #3353)
- Migrate Relay passes to pass manager (#3323, #3289, #3251, #3406)
- Improve heterogeneous annotation by using visitor (#3261)
- Support export ADT value in Python (#3299)
- Extend TensorComputeOp to allow scalar inputs (#3300)
- Transitioning low-level IR away from HalideIR (#3533, #3535)
- Tags for ADT constructors (#3369)
- IR dumping for debugging (#3493)
- Pretty printer and parser roundtrip (#3460, #3536)
- Relay type checking (conv2d weight dimension #3511, any shape #3221)
- Relay Module enhancements (remove free variables #3476)
- LLVM DWARF debug information (#3420)
- Printer for Layout/BijectiveLayout (#3582)
- Type inference escape hatch (#3571)
- Making iterators compatible with constructors of STL containers (#3624)
- Moving Conv, Dense, Concatenate InferTypes to header (#3783)
- Simplify casts of constants 0 and 1 (#3758)
- Conditionally replace reduction init axis. (#3408)
- Improve Partial Evaluator (#3749, #3703)
- Strict mode in Relay pattern matching (#3620)
- Quit and clean when TVM is interrupted (#3640)
- Make Type Relation catch more errors (#3899, #3699)
- Refactor the way we interface between different modules of Relay (#3906)
- Introduce
schedule_injective_from_existing
and unify external schedules for all targets (#3983) - [NODE][REFACTOR] Refactor reflection system in node. (#4189)
- Unify node system and object (#4161, #4115, #4128)
- [Relay][Refactor] Rename Datatype to ADT (#4156)
- [Relay] fix exponential blowup in interpreter (#3559)
- [Relay] Fix memory leak in the interpreter (#4155)
- [rpc] use callback func to do send & recv (#4147)
- Add
lift_if_then_else
pass to improve loop partitioning (#3865) - Decrease the complexity of CalcDep from exponential to linear (#4053)
- [IR] Make iterators compatible with constructors of STL containers (#3624)
- [Relay][Pass] Avoid FoldConstant folding some ops (#4245)
- [Relay][Prelude] More dtypes support in
tensor_t
(#4233) - [NODE][REFACTOR] Rename IRFunctor->NodeFunctor, use func pointer (#4247)
- [RUNTIME][REFACTOR] Use object protocol to support runtime::Module (#4289)
- [CodeGen] Add build config option
disable_assert
to control whether to generate assert. (#4340)
- Formalize Integer Arithmetic Analysis (RFC: #2588). It is aiming to perform better context-dependent analysis, bound analysis, centralized arithmetic logic and arithmetic simplification. (#3272, #3463, #3464, #3368, #3503, #3504 , #3502, #3479 , #3568)
- Introduce FloorDiv/Mod, TruncDiv/Mod, and IndexDiv/Mod for better arithmetic simplification (#3976, #3986, #4000, #4014, #4008, #4028)
- [ARITH] Use floordiv for the deduce bound (#4025)
- [Simplifier] Rewrite simplification rule to eliminate unnecessary conditionals. (#4076)
- Provide error msg for failure function call in tvm4j (#2967)
- Expose backtrace symbols in Debug mode (#3001)
- C++ GraphRuntimeCodegen, Deprecate Python2 (#2986)
- Ensure interpreted functions can take values that are not TensorValues (#3015)
- Make OpenCL runtime Compatible with OpenCL2.0 (#2897)
- Handle INF and NAN in CUDA and OpenCL (#3194)
- Update debug graph runtime for more precise layerwise timing (#3232)
- ROCM support (llvm printing #3662, ld.lld finding #3664, save to file #3665)
- Threadpool: make
spin_count
configurable (#3577) - RPC worker children termination (#3669)
- Vulkan runtime reimplementation (stream approach) (#3849)
- Vulkan backend supports Call::reinterpret and vectorized comparison (#3795)
- Support MKL on Windows (#3837)
- Vulkan IR builder (bool to float #3513)
- Force
code_object_v2
for amd gpu backend (#4099) - [Codegen][cuda-fp16] fallback to fp32 simulation when cuda arch < sm53 (#4268)
- Fix and refactoring for AMD gpu backend (#4305, #4321, #4341, #4342)
- [Debugger] Sorting op-time breakdown for quicker analysis. (#4352)
- [nvcc] enable multiple arch in one fatbin (#4377)
- [RUNTIME] Move module export to the function level. (#4405)
- Relay now supports saving and loading parameter dictionaries. (#2620)
- Add
max_num_threads
to Hybrid Script, which allows users to get max number of threads for GPU targets (#2672). - Improvements for tensorflow frontend (#2830, #2757, #2586), includes decompiling tf control flow (#2830)
- Improvements for mxnet frontend (#2844, #2777, #2772, #2706, #2704, #2709,, #2739)
- Improvements for keras frontend (#2842, #2854)
- Improvements for DarkNet frontend (#2673)
- Improvements for ONNX frontend (#2843, #2840)
- Better profile result dump in Chrome Tracing format (#2922, #2863)
- Unified error handling in NNVM and Relay frontends (#2828)
- Improve NNVM to Relay conversion (#2734)
- Remove
input_0d_mismatch
special handling for TF Frontend(#3087) - Bumped ONNX version from 1.1.0 to 1.4.1 (#3286)
- Simplify parameter handling in Tensorflow frontend (#2993)
- CoreML improvement for image scaler and padding (#3800)
- Clean up TensorFlow frontend (#3710)
- Darknet: Solve tvm parsing darknet resnext failure bug (#3778)
- Frontend changes
get_workload
- (#3483) - [TF][Relay][Op] Pass module when infer shape (#4287)
- Support override in
register_topi_compute
andregister_topi_schedule
. (#3292) - Improve graph tuner dealing with Tuple. (#3649)
- Add AutoTVM template for conv2d Intel int8. (#3955)
- Add AutoTVM template for dense on CUDA. (#3923)
- Add AutoTVM template for conv2d on Intel graphics. (#3839)
- Optimizing autotvm task extraction speed. (#4138)
- [AutoTVM] Add
batch_matmul
to tunable operations. (#4242) - Selecting tuning templates when extracting task. (#4338)
- Enable AlterOpLayout pass for x86 on Relay (#2585). It is essential to get decent performance for CNN-based model on Intel CPUs.
- Better intrinsic matching for x86 CPU and ARM CPU, includes variants of vcvtph2ps and vmlal.s16 (#2925, #2748).
- Improve injective schedule for ARM CPU(#2801)
- Core functionality for Graph tuner (#2184)
- Fast tanh implementation (#3255)
- Improve multi-batch conv2d on x86 (#3308)
- Improve
non_max_suppression
andget_valid_counts
for CPU (#3305) - Improve
roi_align
performance for CPU (#3296) - Improve
nms
andget_valid_count
performance (#3282) - Graph tuner for multiple subgraph (#3490)
- For sparsity, fast transpose for square CSR matrices has been now merged, which is a good start point for more general sparse type support.
- Reduce
set_input
andset_input_zero_copy
overhead (#3805) - Parallelize batch axis for ARM (#3931)
- Support cuBLAS BatchMatMul (#3936)
- Add AVX512VNNI support for TVM (#3388)
- Enhance tuning space of split (#3949)
- Enable miopen transpose convolution and fp16 support (#3952)
- Improve
conv2d_transpose
schedule on X86 and CUDA (#3948) - Expose llvm.nearbyint intrinsic (#4001)
- [TOPI][X86] Pool operator parallel support. (#4090)
- Improve layout for several operators (#4103, #4040, #4080)
- [Relay][VM] Fix constant folding issue in VM compiler (#4077)
- [relay][vm] Reuse allocated device memory (#4170)
- [Runtime] Enable option to use OpenMP thread pool (#4089)
- [PERF] Parallelize reduction for CPU (#4158)
- [TOPI] Tunable Template for Conv2D HWCN on CUDA (#4168)
- [TOPI] Add valid auto tvm for Intel Graphics (#4078)
- [TOPI] FIFO buffer op, to accelerate sequence modeling with dilated convolutions (#4039)
- TensorCore Support using Intrinsic (#4136)
- Auto TensorCore CodeGen (#4234)
- Use cblas for dense and
batch_matmul
(#3787) - Update TOPI softmax compute and CPU schedule (#3680)
- [VTA] Performance optimize, remove unnecessary contigious memory use. (#4246)
- [TOPI][AlterOpLayout][ARM] Enabling NHWC to NCHW layout transformation. (#4249)
- [PERF] Parallelize reduction for CPU (#4158)
- [ThreadPool] Solve thread transitions issue (#4344)
- Tutorials for deep learning frameworks support in Relay.
- Tutorial for running AutoTVM with Relay (#2594).
- Document for Algebraic Data Types (#2575).
- Move NNVM tutorials to Relay (#2783, #2785, #2766, #2693)
- Documentation on operators (#2761)
- Add gradient operator tutorial docs (#2751)
- Add compiler pass tutorial docs (#2746)
- Add Android Tutorial (#2977)
- Developer documentation for InferBound pass (#3126)
- Add missing targets to
target_name
documentation (#3128) - Various documentation improvements (#3133)
- Add VM doc (#3188)
- Update documents for TSim (#3409, #3318, #3302, #3343, #3206)
- Improve tvm4j document describing LLVM support (#3404)
- Tutorial migration to Python3 (#3498)
- Android RPC README (#3500)
- Documentation for Relay opcode (#3522)
- Tutorial for pass manager (#3515)
- Minimum version of Python in docs (#3588)
- Relay pass infra (#3583)
- X86 Autotune tutorial improvements (#3609)
- YOLOv3 tiny Darknet tutorial (#3674)
- SSD doc to avoid confusion (#3677)
- Tutorial: Build a Graph Convolutional Network on TVM (#3681)
- Add docs for analysis namespace (#3985)
- [tutorial] Relay pass infra tutorial (#4083)
- [DOCS] Add TensorFlow frontend docs (#4154)
- Tutorial: update Building a Graph Convolutional Network tutorial (#4060)
- [Docs] Add dependency of compilation with LLVM (#4117)
- [Documentation]Fix example code in comment of
tvm.build_module.build()
(#4195) - TSIM: add virtual memory support to examples (#3868)
- Relay pass infra tutorial (#4083)
- Fix the TF tutorial to run against TF2.0 and TF1.x (#4104)
- Add
topi.nn.fifo_buffer
to TVM doc (#4343) - License statement (#4345, #4359, #4401, #4402, #4408, #4409, #4410, #4414, #4431)
- Increate the robuteness of CI test (#2841, #2798, #2793, #2788, #2781, #2727, #2710, #2711, #2923)
- Improve conda build (#2742)
- Add caffe2 nnvm frontend to CI (#3018)
- Use bridge network and expose port on macOS when launch docker image (#3086)
- Run DarkNet tests (#2673)
- Add file type check (#3116)
- Always run cpptest during build to ensure library correctness (#3147)
- Handle more file types in ASF header (#3235)
- Add
test_forward_ssd_mobilenet_v1
totflite/test_forward
(#3350) - Add Azure build pipeline (#3458, #3459)
- Update ci-gpu to v0.52 (#3374)
- Enable more visible symbols by default (#3365)
- Separate out legacy as a stage in CI (#3337)
- Simplify build script, remove python 2 support (#3419)
- Ignore rust cargo lock files in rat (#3314)
- Improve CUDA Conda package build (#3281)
- Update CMakeLists.txt to be more flexible to find the third parties libraries (#3354)
- Docker update conda package (#3344), requests and pillow (#3495), Android demo (#3499), rat install (#3527), ARM support (#3546), LLVM (#3590)
- Relay-to-Python testing (#3156)
- Code refactoring/remove (#3523, #3667)
- Zero-rank testing (#3612)
- CMake compilation (#3611, #3650, google test #3628)
- Standalone wheel build for TOPI (#3657)
- Fixing performance issues in PassUpDomain when fusing and splitting axes (#3073)
- conda recipe (#3791)
- Allow users to specify download directory (#3803)
- Update docs for installation for CUDA (#3832)
- Update
hybrid_script.rst
(#3799) - Acknowledge Halide attributions (#3824)
- Add psutil dependency (#3780)
- Temporary disable rust test (#3809)
- Solve occasional CI issue when pad value is all 0 (#3801)
- Towards TSIM CI testing (#3704)
- Use pip3 for python3 (#3742)
- Update docker image
ci_cpu,i386
to include verilator (#3738) - Remove sccache from Rust install (#3728)
- Update dmlc-core to the latest commit (#3716)
- Update GPU docker (#3709)
- Add an option to build with -pthread (#3671)
- Add DGL to
{ci_gpu, demo_cpu, demo_gpu}
docker images (#3692) - Use pytest instead of nosetest (#3524)
- Enable NHWC of
relay.testing.mobilenet
(#3886) - Add .hsaco save/load for
tesnor_expr
Tutorial (#3852) - Support LLVM trunk (#3907)
- Remove GTest cmake flag from install docs (#3953)
- Allow
USE_LLVM
to take extra arguments (#3954) - [CI] Pin NNPack pthreadtools version (#4152)
- [TOPI] Fix flaky testcase for check round (#4211)
- [CI] Move gpu docker binary to cuda10 (#4229)
- [CI] use llvm9 for the gpu tests (#4224)
- [CI] Update GPU docker to cuda10 (#4228)
- [Relay] Install Relay Prelude program in package install (#4227)
- [relay] use
time_evaluator
for measurement (#4191) - [Relay] Improve build error when no lowered funcs are produced (#4132)
- [llvm] switch to use Align for llvm trunk (#4051)
- [CUDA] Update
have_int8
condition to run on compute capability 7.x devices (#4214) - [DOCKER] Pin torchvision==0.4.1 (#4140)
- [DOCKER] torch install depends on future package (#4098)
- [CodeGen] Disable -mfloat-abi hard option for LLVM < 6.0 (#4071)
- Add a python how to example of deploying tvm module with tvm runtime only (#4094)
- Hide symbols from dependent libraries if
HIDE_PRIVATE_SYMBOLS
is ON. (#4041) - [BUILD] Disable utvm standalone runtime by default (#4240)
- Fix TSIM compile error in Linux (add missing -fPIC flag) (#3876)
- Add scalafmt and format existing scala codebase (#3880)
- Update TFLite wheel version to 1.13.1 (#3435)
- Remove PEP498 f-string new feature for support python3.5 (#4250)
- Require LLVM >= 9 for AMDGPU backend (#4253)
- Rename ml.dmlc.tvm to org.apache.tvm (#4290)
- [Test][TF][Relay] Fix argument preparation for vm test mode (#4296)
- Add test for the
qnn_add
operator (#4282) - [CI][DOCKER] Add ONNX runtime dep (#4314)
- [CI][DOCKER] Upgrade image to include onnx runtime (#4313)
- [CI] Set workspace to be per executor (#4336)
- [Build][Windows] Fix Windows build by including cctype (#4319)
- [Contrib] Add MKL DNN option (#4323)
- [Test][Relay][Pass] Add test case for lambda lift (#4317)
- Remove Python imp module as it is deprecated (#4275)
- Bump up CUDA log version in tophub.py (#4347)
- Add rule for clean in APPs (#4364)
- [Relay tests] Temporary Attr Update for Order-Independent Testing (#4357)
- [CI] Avoid content-length request in test data download (#4375)
- Compare all outputs in TFLite
test_forward_ssd_mobilenet_v1
(#4373)
- [RELAY] Fix
get_int_tuple
. (#2691) - [ARITH] Select support for integer set analysis. (#2687)
- [Relay] Fix error in ANF (too aggressively inline atomic expression and create free variable). (#2665)
- [Hybrid Script] Fix name conflict and attached scope problem. (#2649)
- [Relay] Fix ANF for reference and pattern matching. (#2637)
- [Relay] Fix fusion bug when call symbol that is not an operator. (#2630)
- Fix missing header file. (#2629)
- [Relay]Fix the bug in heterogeneous annotation which mistakenly steps into the fused op. (#2622)
- [AutoTVM] Fix incorrect localhost usage in RPC mode. (#2619)
- [NNVM] Fix incorrectly getting layout attribute as a tuple. (#2610)
- [Relay] Fix mutating IF expression. (#2601)
- [Tutorial] Fix downloaded file path. (#2590)
- [Storage] Fix int32 overflow bug when input is big. (#2580)
- [NNVM] Fix non-identity problem for FInplaceIdentity. (#2572)
- [Golang] Fix compilation error. (#2558)
- [Tensor Expression] Fix missing reduction init predicates. (#2495)
- [Relay] Fix missing argument for NCHWc in Relay. (#2627)
- [TOPI] Fix
Nms_ir
data race. (#2600) - Fix
compute_inline
with multiple outputs (#2934) - [TEXPR][PASS] Fix thread all reduce to avoid write after read hazzard (#2937)
- [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (#2864)
- [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (#2850)
- Turn on
USE_SORT
by default (#2916) - [DOCKER] Upgrade ci-cpu to latest v0.50 (#2901)
- [TESTS] Import script robustness (set -u) (#2896)
- [Relay] Fix name of bias in testing.mlp (#2892)
- [TESTS] Improve script robustness (#2893)
- Add dense schedules to
__init__
for cpu (#2855) - [Apps] [howto_deploy] fix cxx-flags order and build directory (#2888)
- [Relay] Add TVM_DLL for ANF/GNF conversion #2883
- [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (#2861)
- Fix setting up hints for getaddrinfo (#2872)
- Add missing sgx includes (#2878)
- Fix error reporting for missing axis (#2835)
- Fix an OrderDict initilization bug. (#2862)
- Fix Xcode 10 metal compile error (#2836)
- tvmrpc: Fix includes (#2825)
- Fix
init_proj.py
: Team ID expected (#2824) - [DOCKER] Fix git clone failure. (#2816)
- upgrade java style-check due to CVE-2019-9658 (#2817)
- [Relay][Quantization] Fix duplicated simulated quantization (#2803)
- [Bugfix] Repeat and tile bug fixed, relay tests added (#2804)
- Fix caffe2 relay frontend (#2733)
- Fix a bug in nnvm to relay converter. (#2756)
- Ensure loop count is a constant before trying to unroll. (#2797)
- xcode.py: Decode bytes before output #2833
- [WIN] Fix a bug in
find_llvm
when specify llvm-config (#2758) - [DLPACK] fix flaky ctypes support (#2759)
- [Bugfix][Relay][Frontend] Fix bug in mxnet converter for
slick_like
(#2744) - [DOCS] Fix tutorial (#2724)
- [TOPI][Relay] Fix default
out_dtype
forconv2d_NCHWc
and Relay (#2702) - [Relay] fix checkwellform (#2705)
- fix prelu, now can use on 2d input and add one test (#2875)
- [CODEGEN][OPENCL] Fix compile error about ternary expression. (#2821)
- Fix Placeholder issue (#2834)
- Fix makedirs() condition in contrib (#2942)
- Add missing #!/bin/bash directive (#2951)
- Bilinear resize bug fix from PR #2777 (#2857)
- Fix
bias_add
default axis (#2829) - Remove empty ty.rs (#2958)
- fix undefined reference to dlopen, etc (#2957)
- Removed deprecated
std::unary_function
(#2962) - Add output format to ndk build func (#2999)
- Fix java checkstyle version (#2998)
- Fix relay invariant error message (#3011)
- Fix for caffe2 nnvm frontend (#2996)
- Fix rust resnet example (#3000)
- Fix x||!x for comparisons in rewrite simplifier (#3029)
- Fix BatchMatMulRel typerelation (#3032)
- Update dmlc-core, fix default ctors of NodeEntry (#3017)
- Fix Fuse (#3035)
- Fix PostOrderVisit signature (#3048)
- Fix winograd nnpack fp16 (#3046)
- Fix some typos (#3063, #3112)
- Fix
group_conv2d
unit test (#3113) - Fix bug in ONNX importer (#3084)
- Fixing a doc nit (#3123)
- Fix type code error for StringImm (#3050)
- Fix bug of wrongly generated
device_map
(#2990) - use
unordered_map
instead of map in ANF (#3024) - Fix PRelu layout in Relay (#3013)
- Minor addition to graph runtime debug (#3129)
- Fix mali conv2d performance regression (#3131)
- Fix dense autotvm template registration in ROCm (#3136)
- Fix
conv2d_transpose
(#3138) - Fix python lint warnings (#3145)
- Some fixes for golang latest version compiler #3119 (#3182)
- Add more syncs to fix flaky test caused by
get_valid_counts
(#3151) - Fix AlterLayout Pass (#3155)
- Fix a multithreaded bug in llvm LazyInitJIT (#3158)
- Fix a tensorflow test bug. (#3165)
- Fix concat for ARM (#3061)
- Handle vectorize for LE statement (#3137)
- Raise exception
group_conv2d_nchw
not supported (#3195) - Quick fix of VTA FPGA Toolchain Installation documentation (#3196)
- Check file exists before removing it (#3178)
- Fix a bug of flatten in ONNX to Relay converter (#3180)
- Fix converter where initializers were not registered as nodes (#3143)
- Fix bug in cast to bool (#3207)
- Hotfix
build_module
creation (#3198) - Fix sort changing original input data issue (#3212)
- Fix bug in vta runtime DepPop function (#3208)
- Fix resize nearest with fractional scaling (#3244)
- Fix
vta_conv2d
crash issue after changevta_config.json
(#3213) - Fix a memory leak in OpManager (#3263)
- PkgConfig cause crash in PYNQ board due to link library (#3257)
- Fix Error messages in tflite.py (#3320)
- Fix typos in docs and comments (#3309, #3376)
- Bugfix min/max const canonicalize rule (#3386)
- Return module from frontend for autotvm (#3401)
- Fix constant and reshape in ONNX (#3387)
- Default verilator location fix (#3324)
- Fix autodiff for conditional expression (#3453)
- Gramatical improvements to
tensor_expr_get_started
(#3330) - Fix AutoTVM data structure bug (#3462)
- Fix MXNet RNN without providing state initialization as input (#3326)
- Fix flaky test on topk and quantize pass (#3362)
- Add VTA PYNQ
metal_test
bitstream program logic and fix compilation issue. (#3400) - Fix VTA function Vivado Compile Error. (#3375)
- Fix VTA DRAM functionality issue. (#3278)
- Fix reshape precompute and type error in ONNX frontend (#3230)
- Fix interpreter argument conversion for tuples. (#3349)
- Fix code generation for packed functions + tuples in VM (#3287)
- Fix memory leak in Relay interpreter (#3448)
- Fix x86 depthwise conv2d
alter_op_layout
(#3264) - Create closure object for GlobalVar (#3411)
- Fix getting global var in prelude (#3405)
- Fix rfactor bugs which related to predicate and loop partition (#3382, #3444)
- Fix the bug in AutoTVM where SimulatedAnnealingOptimizer sometimes finds useless candidate (#3413)
- Fix name conflict in PartialEval (#3402)
- Fix int bound analysis bug for modular (#3288)
- Check arg positiveness for modular rules (#3279)
- Fixes failure of
sum
andall
onaxis=0
(#3422) - Fix package path in tflite test (#3427)
- Fix Windows build (#3429)
- Fix
LSTMBlockCell
in Tensorflow frontend (#3410) - TF fix where output index is ignored (#3622)
- Runtime fix for custom datatypes (#3471)
- Relay build module warnings (#3452)
- Relay partial evaluator (#3482)
- Pynq AutoTVM tracker (#3497, #3578)
- A normal form test (#3525)
- Lint issue (#3519, #3615 )
- Any shape testing (#3528)
- Android
posix_memalign
(#3532) - Quantization
add_rewrite
and UnifyDTypeScale (#3534) - Bound inference fix (#3526)
- Tensorflow NCHW data format (#3514)
- First order gradient (#3550)
- JS load module example (#3556)
- Build error (#3552)
- Relay VM debug statements (#3565)
- C++ lambda expr (#3570)
- Handling of tempdir if subprocess is killed (#3574)
- Remove tabs in Chisel source (#3603)
- Relay VM DataTypeObject (#3604)
- Removing prints (#3616)
- Average Pool2D Bug (#3607)
- Missing header in
cuda_device_api.cc
(#3621) - Tensorflow frontend fix where
output_shape
is None (#3632) - Winograd accuracy fix (#3644)
- Fix comment (#3646)
- Zero-input op fix for recursive traversals (#3623)
- Python 3.5 compatibility (#3675)
- Fix infinite recursive
device_api.ext_dev
call in VTA. (#3843) - Fix
depth_mult
for TensorFlow frontend (#3676) - Fix database APIs for AutoTVM (#3821)
- Fix axis of softmax in Keras (#3834)
- Fix VTA TensorLoad module (#3841)
- Fix inconsistent python/cpp API behavior for
if_then_else
, power (#3829) - Fix code comment of operators in ONNX frontend (#3830)
- Added repo for llvm-9 to fix missing dependency issue (#3826)
- Fix typo in Relay text parser (#3785)
- Fix tvm const warnings (#3817)
- Add gfx906 bc (#3808)
- Fixed onnx test failures when run on a cpu backend (#3764)
- Fix ArgBinder assert order (#3794)
- Fix for NoneType Target for quantization (#3792)
- Fix out-of-date quantization realize (#3790)
- Fix Qnn concatenate InferType (#3779)
- Fix dense tuning (#3768)
- Fix
visit_pattern
in ExprMutator (#3769) - Fix Chisel Scala style (#3765)
- Fix some pass docs (#3767)
- Fix mistype in rpc tutorial (#3763)
- Fix tvm.scan follow by tvm.compute segfault (#3723)
- Fix the potential index overflow in where operator (#3751)
- Revert
compile_cmd
kwarg name change (#3746) - Update tophub (#3752)
- Fix typo in
ir_pass.h
(#3741) - Bug fix for VME Shell (#3737)
- Fix missing apt https transport support (#3735)
- Take zero extent loops as NoOp and remove it (#3724)
- Fix mxnet converter for hybridblock and add
div_sqrt_dim
(#3701) - Fix partial eval unit test name (#3719)
- Fix conv2d schedule code (#3648, #3717)
- Remove thread related headers (#3713)
- Fix FunctionPass (#3712)
- Export tvm::relay::OpRegistry::OpRegistry (#3711)
- Fix Metal reinterpret (#3706)
- Fix
gather_nd
in Relay (#3442) - Fix error in partial evaluator (#3693)
- Align the naming rule for OpAttributeUnImplemented (#3695)
- Enable the sparse schedule (#3651)
- Fix typo names in Caffe2 frontend (#3685)
- Make tests multi-process friendly. (#3683)
- Fix typo in README.md (#3684)
- Fix doc rendering (#3897)
- Add test script starter command to document (#3993)
- Add type solver unit tests for unifying quantified funcs (#3947)
- Change Vivado install instructions to version 2018.3 (#4003)
- Add a link to the defining network description of auto-tuning tutorial (#4023)
- Additional MXNet Convolution and Deconvolution tests (#4026)
- Adding support to check if an attribute is present or not without having to get the value (#3957)
- Fix parser for cast. (#3873)
- Fix operator fusion for multiple output (#3871)
- Remove extern C warpper for cuBLAS (#3877)
- Fix int32 range overflow by using int64 (#3870)
- Remove duplicate resize (#3902)
- Fix blas cmake for mac os (#3898)
- Add another MKL name alias for MKL installed through pypi (#3853)
- Numpy compatible dtype inference for
tvm.convert
andtvm.const
(#3861) - Remove incorrect check for LLVM in C codegen test (#3921)
- Fix exponential blowup in interpreter (#3559)
- Fix CUDA int8x4 vectorize (#3928)
- Make buffer auto broadcast independent to the order of input args (#3956)
- Fix benchmark layout in graph tuner (#3926)
- Fix Android Demo LLVM version (#3962)
- Cast filepath arguments to string (#3968)
- Fixes "common" sub crate using nightly and main (#3965)
- Changes to make tensorize work. These changes also fix the previously broken test. (#3981)
- Remove FLOP computation when calling 3rd party library (#4005)
- Use a more intuitive way to limit the #ops in a group (#4018)
- Add more
pad_mode
support for onnx converter (#4029) - Impose a max op limit to the op fusion pass (#4002)
- Fixes issue with CPP enums (#4019)
- Int64 shape handling for outputs. (#4031)
- [PYTHON] Fix installation for generated grammar (#4223)
- [Bugfix] Fix target host for vm compiler (#4057)
- [Fix][VM] Fix VM invoke with
set_params
(#4079) - [Fix] Fix a few bugs when dtype is fp16 (#4088)
- [Relay][Frontend][TF] Fix Size operator (#4175)
- [cmake][ANTLR] Support setting path to ANTLR jar (#4176)
- Fix infer type of kernel in dense. (#4125)
- [Relay] Fix match case in Python-side expr functor (#4037)
- Split
adaptive_pool2d_avg
into sum and div (#4186) - [AutoTVM] Fix Split Factors when
no_tail
is off (#4044) - Fix extent one for the
post_stmt
in loop partition (#3734) - [TOPI] Fix bug in intel graphics auto tune (#4093)
- [ARITH] Fix lowering of
floormod(x, y) != 0
(#4127) - [ARITH] Fix the rule
y < x && x <= y
(#4220) - [Bugfix][TF] reset graph after getting tag of savedmodel (#4055)
- [Fix] Fix the logic of the number of nodes checking in op fusion (#4074)
- [VTA] hotfix for de10-nano driver (#4081)
- Fixing tensor not found issue in bitserial operator (#4095)
- Fix wrong
n_trial
number in autotvm tutorials' progress bar ifn_trial
is larger then config space. (#4070) - [PATCH] Fix undefined
__floatdihf
in libtvmruntime.so on aarch64. (#4119) - [ARITH] Fix lowering of FloorMod (#4236)
- [Relay][Frontend][Tensorflow] Fix GatherV2 (#4238)
- Fix typing.Deque import error for Python 3.5 (#4254)
- [VTA] Hotfix for padded load test in Chisel VTA (#4264)
- [Contrib] Fix error message at
callback_get_section_size()
(#4221) - [TOPI] Fix bug in Winograd on CUDA (#4260)
- AutoTVM: Fix hang/crash issues on feature extraction (#3689)
- [TOPI][CUDA] Fix Winograd Kernel Size Support (#4276)
- [Relay][Frontend][Tensorflow] Fix type assignment for 'tf.range' operator (#4294)
- Fix incorrect call to Unicode Win32 InetPton (#4306)
- [Relay][Frontend][Keras] handle
batch_norm
op params well (#4310) - [VTA] fix error when
memory_id
isVTA_MEM_ID_OUT
(#4330) - [Doc][fix] fix sphinx parsing for pass infra tutorial (#4337)
- [Codegen] remove fp16 function override for cuda (#4331)
- [TFLite] Fix Prelu unified shape error (#4326)
- [Relay][Frontend][TF] Fix transpose when axes is not a param (#4327)
- [VTA] Bug fix for padded load with large inputs (#4293)
- Fix inconsistent operator tag name (#4134)
- Fix for a specific case when loop partitioning with indivisble. (#4243)
- Send list as argument to
schedule_conv2d
(#4358) - [Docker] Fix TVM folder name for installing on Android and OpenCL. (#4363)
- Fix TFLite Reshape assert (#4320)
- [Relay][Frontend][TF] Fix slice when begin or size is not Const (#4372)
- Fix compilaton of bfloat16 on Windows (#4415)
- The performance of Relay VM is not good enough on GPU, due to memeory allocation overhead which will be resolved later.
- TFlite rounding vs tvm rounding causing differences in accuracy and potentially off by 1 errors. For reference #3900
- TFlite pre-quantized network support is still a work in progress and the project would welcome further contributions.
- TSIM build requires
python
command exist on the host. See forum discussion for details. - Tensorflow control flow has not been fully supported in the frontend converter.
topi.floor_div
is inconsistent with floor division semantic when result number is close to an integer.
- Deprecating python2 support and following release (v0.6). (#2994, #2986)
- NNVM is deprecated and will be removed in a future version. (#4333, #4368)
This release features several major improvements. Some of the highlights are: Arbitrary bits quantization algorithm; High-level auto-differentiable programming IR -- Relay.
- Fully featured 8-bit network support
- 8bit quantizer
- Arbitrary bits quantization algorithm
- Intel cpu support
- ARM cpu support
- NVidia GPU 8-bit kernel
- int8 gemm recipe
- int8 conv2d
- Autotvm integration
- Automated tuning and scheduling
- AutoTVM optimizations for mobile GPUs
- AutoTVM optimizations for CUDA
- AutoTVM optimizations for x86
- Initial release of the differentiable programming IR, Relay
- Generic & informative Relay error reporting #2408
- Relay IR text format support #1781
- Support control flows
- A Normal Form Canonicalization #2251
- Type system support
- End to end compilation
- Frontend support: Caffe2 #2507 , CoreML #2476 , Keras #2376 , MXNet #2163 , ONNX, TFLite #2365
- Operator coverage #1799 #2051
- FoldScaleAxis #2020
- SimplifyInference #2033
- CombineParallelConv2D #2089
- InstrumentBoundCheckers pass #2079
- Bind & FoldConstant #2100
- Alter Op Layout #2150
- General OpFusion #2090
- CodeGen
- Gcc / g++ compatible C code generator for TVM #2161
- Device type annotation for heterogeneous compilation #2361
- Cache packed func ptr, lift alloca #2070
- Generalize compute to tensor region #1476
- Runtime
- Relay interpreter and compiler #1954
- Heterogeneous runtime #1695
- Language bindings: Golang runtime #1470 , Rust runtime #1597
- Add min_repeat_ms to time_evaluator #2200
- Bundled interpreter demonstration #2297
- Enable PlanMemory in the graph runtime #2120
- Language Binding
- Rust frontend #2292
- VTA
- Improved RPC for VTA #2043
- Hybrid python programming model
- Support for scheduling #2416
- Support for Inter-function call #2287
- Backend support #2477
- TOPI
- Initial support for sparse tensor computation
- Improve ARM CPU depthwise convolution performance #2345
- Port winograd ops to relay #2356
- Add faster-rcnn proposal op #2420
- Tutorials and docs
- Relay language docs #2232
- Tutorials on how to use SGX backend
- How to write a pass in python
- General lowering flow of TVM
- How to do tensorize
- TFLite frontend tutorial #2508
- Keras seq2seq model for translation tutorial #1815
- Committer guide and tips #2468
- Code review guideline on API designs #2459
This release features several major improvements. The high-level graph optimizer is now part of TVM repo. Some of the highlights are: Initial support of AutoTVM for automated optimization; customized accelerator backend VTA.
- Tensor operator primitives
- Introduce attrs field to operator primitives(e.g. compute) to store additional metadata, the attrs can be used as hint for scheduling
- Enable embedding of asm micro-kernels
- Hybrid python programming model
- python AST based IR builder interface
- support GPU programs
- AutoTVM, Automated tuning, and scheduling
- basic autotvm infra
- GPU IR verifier
- basic autotuning tutorial
- topi integration
- ARM support
- winograd support
- initial support of ARM autotuning records
- TOPI Vision
- Generic GPU sort support(useful for vision)
- SSD operator support
- TOPI numpy consistency
- Rename all binary operators for numpy consistecy: broadcast_add-> add, broadcast_sub -> substract, broadcast_mul -> multiply, broadcast_div->divide
- New operators: slice, LRN, equal, not_equal, less, greater
- tutorials on topi
- Initial low-bit operator support support
- Optimized popcount generation on ARM
- general bit-serial convolution and GEMM
- optimized low bit kernels
- parallel optimization
- New topi backend optimization for intel graphics
- Adapt AVX schedules for SSE target
- VTA: customized accelerator backend
- custom hardware backend example
- tutorials on how to use customized accelerator
- Initial experimental support for HLS backend
- Bugfix in SPIRV code generator for vulkan
- libdevice support, enable NVPTX backend
- Introduce NDArrayContainer for managed NDarray
- RPC and Device API
- Support communication between big/small endian machines.
- RPC and device API protocol upgrade (this is a non-backward compatible change) to support big-small endian communication. This is a non-backward compatible change, need to use the latest version of TVM runtime with the RPC
- graduate rpc from contrib, tvm.contrib.rpc->tvm.rpc -Support tracker in Android RPC, add fault tolerance for AutoTVM
- BIG.LITTLE aware threadpool
- tvm4j graph runtime that runs end to end workload in java
- DLPack support
- Support from_dlpack and to_dlpack
- Enables bridges to pytorch
- Enable link of stackvm in runtime
- Tensorflow graphdef frontend
- Keras frontend
- improved to support reuse layers, add activations
- ONNX
- gather, LRN
- CoreML frontend
- Support C-RNN and activation functions
- Fix grads for sum and expand_like
- Enhanced operator fusion for multiple elemwise branches
- Separate nnvm fusion and compilation pass
- Unified build system to cmake, customizable cmake path for vulkan, rocm, cuda
This release features numerous improvements in TOPI and backends. We make the first step toward object detection support in TOPI, featuring operators necessary for YOLO and SSDs. The topi now supports numpy-style API and operator overloading. RPC is significantly improved to support resource allocation and using a pool of devices. We are adding two new backends: WebGL for running GPUs on the browser, and Vulkan for running on next-generation graphics API.
- TOPI Vision operators
- SSD support
- YOLO support
- NMS operator support in vision
- TOPI general numpy-style operators
- numpy style operator overload in topi
- more operators: flip, take
- dilation support on conv2d and depthwise
- 8bit support
- ARM 8bit gemm
- ARM 8bit conv
- Low bit operator support
- popcount intrinsics
- 1-bit fully connected
- Contrib: MPSDNN fully-connected and conv2d support
- Better RPC support
- RPC Tracker support to allow centralized resource management
- RPC protocol upgrade (this is a non-backward compatible change) to support timeout in the proxy
- This is a breaking change, need to use the latest version of TVM runtime with the RPC
- Fault-tolerant to early server termination with correct exception propagated
- RPC support enabled for ROCm AMDGPUs
- Tutorials and docs
- How to deploy to android devices.
- Optimizations for hardware backends
- intel CPU (AVX and AVX512)
- Schedule Primitives
- rfactor now support factor_axis to specify the factored dimension in the result
- cache_write now support multiple output operators
- enable warp memory which generates shuffle instructions
- Framework bridge
- MXNet bridge supported
- C++ compiler API support
- build migration
- topi migration to c++
- Target system in c++
- WebGL backend
- runtime and codegen
- topi integration
- end to end pipeline on the browser
- Vulkan backend
- vulkan runtime
- spirv code generator
- Security
- intel SGX runtime support
- multi-threaded SGX runtime
- LLVM 7.0 support
- Robustness
- VerifyMemory to verify incorrect GPU schedules that writes into GPU memory from cpu
- Verify compute formulas
- Better CPU parallel runtime
This release comes with a complete set of TOPI support for NNVM compiler, which allows compilation of end to end workloads. We also make major improvements in supporting new backends: ROCm for AMDGPUs and ARM GPU.
- Backend support
- Support LLVM mainline(4.0, 5.0, 6.0)
- Support ROCM stack for AMD GPUs
- More robust OpenCL support for ARM GPUs
- Android RPC runtime
- Multi-threading optimization for ARM
- multi-threaded depthwise
- multi-threaded conv2d
- New schedule primitives
- storage_align for shared memory alignment
- double_buffer
- UnrollLoop : more robust version of unroll loop, count maximum steps that can be unrolled.
- Full set of TOPI operators
- Introduce tvm.target to specify target options for compilation better.
- broadcast/ reduction operators
- pooling and global pooling
- Generic target support for topi
- schedule with external libraries
- End to end deep learning pipelines for CPU, GPU, ARM GPU
- Tutorials
- How to load compiled module in any language runtime
- How to use java runtime
- Contrib library: MIOpen, CuDNN
- Ongoing items that contains functioning pieces
- WebGL backend
- C++ compiler support
- MPS DNN
- low bit support, introduced popcount
- Language runtime
- python
- javascript
- java
- c++
- Backend
- arm, x86
- javascript, wasm
- CUDA
- opencl
- Metal
- DNN Library integration
- RPC runtime
- TOPI operator pipeline python
- TOPI operator pipeline in C++
- Rough perf of the TOPI GPU pipeline
- Rough pref of TOPI CPU pipeline
- End to end graph executors
- Pack libary into shared library.
- External function and contrib libraries
- DLPack integration support
- AOT and module system
- Basic code structure ready.