-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Description
Introduction
The TVM community has worked since the v0.10.0 release to deliver the following new exciting improvements!
-
Metaschedule
- Tuning API improvements and anchor-block tuning
-
TVMSCript metaprogramming
- Lots of progress wiht TVMScript, with the introduction of a core parser, AST, Evaluator, Source and diagnostics
And many other general improvements to microTVM, code quality, CI, frontends, and more! Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.
RFCs
These RFCs have been merged in apache/tvm-rfcs since the last release.
What's Changed
Note that this list is not comprehensive of all PRs and discussions since v0.10. Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.
Adreno
- [Adreno] Add global pooling schedule #13573
- [Adreno] Add documentation for Adreno deployment #13393
- [Adreno] Fix mem_scope annotations for prim funcs having several heads #13153
- [Adreno] Adapt reduction schedule for adreno #13100
- [Adreno] Fix winograd accuracy #13117
- [Adreno][Textures] Fix static memory planner #13253
- [DOCKER][Adreno]Docker infra for Adreno target with CLML support #12833
AoT
- [AOT] Add CreateExecutorMetadata analysis pass #13250
- [AOT] Add CreateFunctionMetadata analysis pass #13095
- [AOT] Sanitize input/output name in runtime #13046
Arith
- [Arith] Add internal NarrowPredicateExpression utility #13041
- [Arith] Optional rewriting and simplification into AND of ORs #12972
arm
AutoTVM
Build
CI
- [ci] Fix docs deploy #13570
- [ci] Split Jenkinsfile into platform-specific jobs #13300
- [ci] Dis-allow any non-S3 URLs in CI #13283
- [ci] Split out C++ unittests #13335
- [CI] Separate the ci scripts into Github and Jenkins scripts #13368
- [ci] Assert some tests are not skipped in the CI #12915
- [ci] Ignore JUnit upload failures #13142
- [ci] Lint for trailing newlines and spaces #13058
- [ci] Template build steps #12983
- [ci][docker] Allow usage of ECR images in PRs #13590
- [ci][docker] Read docker image tags during CI runs #13572
- [skip ci][ci][wasm] Add package-lock.json to git #13505
CL
CMSIS-NN
- [CMSIS-NN] Support for int16 conv2d #12950
- [CMSIS-NN] Support for int16 in fully connected layer #13484
DNNL
Docker
Docs
Ethos-N
- [ETHOSN] Consolidate target string usage #13159
- [ETHOSN] Throw error message when inference fails #13022
- [ETHOSN] Inline non-compute-intensive partitions #13092
- [ETHOSN] Transpose fully connected weights #12970
- [ETHOSN] Support conversion of add/mul to requantize where possible #12887
Frontend
Hexagon
- [Hexagon] Add HVX quant conv2d implementation #13256
- [Hexagon] Add test to show scheduling of resnet50 with async dma pipe… #13352
- [Hexagon] Enable Hexagon User DMA bypass mode #13381
- [Hexagon] Lint tests part 2 #13271
- [Hexagon] Add pylint on tests #13233
- [Hexagon] Add E2E test demonstrating how to apply blocked layout schedule to conv2d via metaschedule #13180
- [Hexagon] Add a test to show how to use multi input async dma pipelin… #13110
- [Hexagon]: Add upload function to hexagon session #13161
- [Hexagon] Add support for instrumentation based profiling for Hexagon #12971
- [Hexagon] Add power manager #13162
- [Hexagon] Add scripts for e2e MetaSchedule tuning demonstration #13135
- [Hexagon] Add feature to copy logcat to --hexagon-debug and add new --sysmon-profile option to run sysmon profiler during the test #13107
- [Hexagon] Async DMA pipelining test suite #13005
- [Hexagon] Enable multi input Async DMA; same queue / stage #13037
- [Hexagon] Do not use
targettest fixture in Hexagon tests #12981 - [Hexagon] 3-stage pipeline; multi queue async DMA for cache read / write #12954
- [Hexagon] vrmpy tensorization for e2e compilation of int8 models #12911
- [Hexagon] Support template-free meta schedule tuning #12854
- [Hexagon] depth_to_space slice op #12669
- [Hexagon] Make allocate_hexagon_array a hexagon contrib API #13336
- [Hexagon] Add fix for vtcm allocation searches #13197
- [MetaSchedule][Hexagon] Add postproc for verifying VTCM usage #13538
- [Hexagon][QNN] Add TOPI strategies for qnn ops mul/tanh/subtract #13416
- [Logging][Hexagon] Improve logging on Hexagon #13072
- [Hexagon] [runtime] Per-thread hardware resource management #13181
- [Hexagon] [runtime] Create objects to manage thread hardware resources #13111
- [QNN][Hexagon] Disable QNN canonicalization pass #12398
- [Hexagon] [runtime] Manage RPC and runtime buffers separately #13028
- [Hexagon] [runtime] VTCM Allocator #12947
- [TOPI][Hexagon] Add schedule and test for maxpool uint8 layout #12826
- [TOPI][Hexagon] Implement quantize op for hexagon #12820
- [Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule #12141
- [TIR] [Hexagon] Add vdmpy intrinsic and transform_layout for tests #13557
- [Hexagon] [runtime] Support VTCM alignments of 128 or 2k #12999
- [HEXAGON][QHL] Clippling the inputs of HVX version of QHL Sigmoid operation #12919
- [Hexagon] [runtime] Add user DMA to device API resource management #12918
LLVM
- [LLVM] Emit fp16/fp32 builtins directly into target module #12877
- [LLVM] Switch to using New Pass Manager (NPM) with LLVM 16+ #13515
MetaSchedule
- [MetaSchedule] Make
MultiLevelTilingapply condition customizable #13535 - [MetaSchedule] Enhance Database Validation Script #13459
- [MetaSchedule] Fix Dynamic Loop from AutoBinding #13421
- [MetaSchedule] Support schedules with cache read in RewriteLayout #13384
- [MetaSchedule] Improve inlining and
VerifyGPUCodefor quantized model workload #13334 - [MetaSchedule] Add JSON Database Validation Scripts #12948
- [MetaSchedule] Fix the order of applying
AutoInlineinScheduleUsingAnchorTrace#13329 - [MetaSchedule] Refactor ScheduleRule Attributes #13195
- [MetaSchedule] Improve the script for TorchBench model tuning & benchmarking #13255
- [MetaSchedule] Enable anchor-block tuning #13206
- [MetaSchedule] Introduce a variant of ModuleEquality to enable ignoring NDArray raw data #13091
- [MetaSchedule] Consolidate module hashing and equality testing #13050
- [MetaSchedule] Support RewriteLayout postproc on AllocateConst #12991
- [MetaSchedule] Tuning API cleanup & ergonomics #12895
- [MetaSchedule] Fix XGBoost Import Issue #12936
- [MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking #12914
- [MetaSchedule] Restore
num_threadsparameter in tuning API #13561 - [MetaSchedule] TorchBench tuning script: add option to disallow operators in sub graph #13453
- [MetaSchedule] Fix segfault in gradient based scheduler #13399
- [MetaSchedule] Add
from-targetDefaults for x86 VNNI Targets #13383 - [MetaSchedule] Fix Task Hanging in EvolutionarySearch #13246
- [MetaSchedule] Allow skipping exact NDArray rewrite in RemoveWeightLayoutRewriteBlock #13052
- [MetaSchedule][UX] Support Interactive Performance Table Printing in Notebook #13006
- [MetaSchedule][UX] User Interface for Jupyter Notebook #12866
microNPU
- [microNPU] Upgrade Vela to v3.5.0 #13394
- [microNPU] Fixed MergeConstants pass on striped networks #13281
microTVM
- [microTVM] Modernize Arm Cortex-M convolution schedules #13242
- [microTVM] Improve code reuse in Corstone300 conv2d tests #13051
- [microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts #12969
- [microTVM] Use default Project Options in template projects and add Makefile for Arduino template project #12818
- [microTVM] Generalize depthwise_conv2d schedule #12856
- [microTVM] add the option to open a saved micro project for debugging #12495
- Added macro generation in MLF export #12789
- [microTVM][Arduino]Add
serial_numberto project options and tests #13518 - [microTVM][Zephyr] Add 'serial_number' option #13377
- [microTVM][PyTorch][Tutorial]Adding a PyTorch tutorial for microTVM with CRT #13324
Misc
- [CodegenC] Explicit forward function declarations #13522
- [FQ2I] Support converting
dense->addtoqnn.dense->add->requantize#13578 - [Minor][Testing] Consolidate IRs into corresponding functions #13339
- Add recursive on loop with marked kUnrolled #13536
- Skip stride check if shape is 1 in IsContiguous #13121
- [TEST] CPU feature detection for x86 and ARM dot product instructions #12980
- [Node] Expose StructuralEqual/Hash handler implemenation to header #13001
- [Tensorize] Add logs to comparator to make debugging tensorize failures easier #13285
- [usmp] Also remap VarNode to USMP-allocated buffer #12880
- [Virtual Machine] Implementation of 'set_output_zero_copy' #11358
ONNX
- [ONNX] Add converter for FastGelu from Microsoft onnxruntime contrib opset #13119
- [QNN, ONNX] Extension of QLinearMatMul in ONNX front-end for all ranks of input tensors #13322
OpenCL
- [OpenCL] Introduce OpenCL wrapper to TVM #13362
- [OpenCL] Introduction of weights on buffers #13563
- [OPENCL][TEXTURE] Test case enhancements and fixes for RPC #13408
Relay
- [Relay] Fix
CombineParallelDenseslicing axis #13597 - [Relay] Refactor constant folding over expr into a utility function #13343
- [Relay] Enhancement for fold_scale_axis and simplify_expr #13275
- [Relay] Add ClipAndConsecutiveCast and CastClip to SimplifyExpr #13236
- [Relay] Rewrite division by constant to multiply #13182
- [Relay] Extend split for blocked ConvertLayout pass #12886
- [Relay][transform][SimplifyExpr] simplify adjacent muls and adds with constants #13213
- [Relay][Hexagon] Add per-channel FixedPointMultiply operation #13080
- [IRBuilder][Minor] Add intrinsics like
T.int32x4#13361
roofline
- [ROOFLINE] Add support for different dtypes #13003
- [Roofline] Add fma (non-tensorcore) peak flops for CUDA #13419
RPC
Runtime
Target
- [Target] Replace utility functions with target.features #12455
- [Target] Add Target Parser for Arm(R) Cortex(R) A-Profile CPUs #12454
- [Target] Add target_device_type attribute to override default device_type #12509
TIR
- [TIR] Add preserve_unit_iters option to blockize/tensorize #13579
- [TIR] Introduce ReduceBranchingThroughOvercompute #13299
- [TIR] Unify index data type when creating prim func #13327
- [TIR] Remove PrimFuncNode::preflattened_buffer_map #10940
- [TIR] Make syntax of AST nodes different than ops #13358
- [TIR] Update ReductionIterNotIndexOutputBuffer to check BlockRealizeN… #13301
- [TIR] Check producer predicate in
ReverseComputeInline#13338 - [TIR] Add utility for anchor block extraction #13194
- [TIR] Allow IndexMap applied to arguments with different dtypes #13085
- [TIR] Fix handling of int64 extent in blockize and tensorize #13069
- [TIR] Refactor NarrowDataType into DataTypeLegalizer #13049
- [TIR] add unit-tests for upcoming primfunc-slicing #12794
- [TIR] Fix plan buffer allocation location for loop carried dependencies #12757
- [TIR] Fix predefined inverse map in layout transform dtype legalization #13565
- [TIR] Preserve loop annotation after loop partitioning #13292
- [TIR] Use IndexMap to transform NDArray #12949
- [TIR] Preserve loop annotations in inject_software_pipeline pass #12937
- [TIR][Schedule] Support for specific consumer block targeting in cache_write #13510
- [TIR][Hexagon] Add vtcm memory capacity verification for Hexagon target #13349
- [TIR][Transform] Optional data-flow analysis in RemoveNoOp #13217
- [TIR][Analysis][Arith] Implement basic data-flow analysis #13130
- [TIR][Bugfix] Fix AXIS_SEPARATORS in tir.Schedule.transform_layout #13326
- [TIR][Arith] Use TryCompare to narrow inequalities if possible #13024
- [TIR][Primitive] Support rolling_buffer schedule primitive in TensorIR #13033
- [Arith][TIR] Check for constant offsets of known literal constraints #13023
- [TIR][Arith] Implement kApplyConstraintsToBooleanBranches extension #13129
- [TIR][Schedule] Add cache_index to precompute index of buffer load #13192
- [TIR][Schedule] Add cache_inplace primitive to cache opaque buffer #12939
- [UnitTest][TIR] Support IRModule comparisons in CompareBeforeAfter #12920
- [TIR][Arith] Prove conditionals by transitively applying knowns #12863
- [TIR, MetaSchedule] Preserve unit block iters for auto-tensorization #12974
- [TIR][MetaSchedule] Add regression test for layout_rewrite extent=1 #12916
- [TIR][Transform] Keep the allocate buffers order after update buffer allocation location #13560
- [TIR][Schedule] Fix cache_read loc detecting and region_cover checking #13345
- [TIR][Transform] Clear buffer_map during MakeUnpackedAPI #12891
- [TIR][Schedule] Relax cache read/write's restriction and fix unexpected behavior #12766
TOPI
- [TOPI] Implement Einsum with reduction axes #12913
- [TOPI] Add layer norm operator #12864
- [TOPI] Add handwritten matvec for dynamic cases #13423
- [TOPI] Fix dtype legalize logic for CPU dot product instruction #12865
- [TOPI][Hexagon] Implement quantized adaptive_avg_pool1d for hexagon #13282
- [TOPI][Hexagon] Implement quantized depthwise conv2d #12499
Torch
- [TVM PyTorch Integration] optimized_torch & as_torch how-to guide #12318
- [frontend][pytorch]Support aten::Tensor_split operator #12871
TVMC
TVMScript
- [TVMScript] Improvements tvm.script.highlight #13438
- [TVMScript] Reorganize the folder structure #12496
- [TVMScript] TIR parser #13190
- [TVMScript] IRModule parser #13176
- [TVMScript] Evaluator, core parser, var table #13088
- [TVMScript] AST, Source and diagnostics for Parser #12978
- [TVMScript] Import TIR methods into the IRBuilder #12900
- [TVMScript] Infer T.match_buffer parameters for region #12890
Metadata
Metadata
Assignees
Labels
No labels