Releases: alibaba/MNN
Releases · alibaba/MNN
Support quantization aware training
1、Support quantization aware training. See https://www.yuque.com/mnn/cn/bhz5eu for detail.
2、Support python API for MNN-Express. See pymnn/examples/MNNTrain for detail.
3、Speed up several case of binary op.
0.2.1.9
0.2.1.7
-
Vulkan winograd bug Fix
-
Fix Segment Fault issue crash when converting TFLite Models
-
Workaround Metal Softmax Op
-
Lower required OpenCL version to 1.10 from 2.0 to 1.10
-
vulkan winograd bug 修复
-
修正 tflite 转换时段错误问题
-
metal softmax bug 暂时规避
-
OpenCL 所需要版本由 2.0 降到 1.10 ,提升兼容性
0.2.1.6
Op Support
Added support for:
- Over 30 TFLite Ops
- Over 20 Onnx Ops
- 9 Caffe Ops
- 24 Tensorflow Ops
Project Layout and Engineering Improvements
- CMake Build System Rewrite
- Header Layout Standardization. Public headers are now under
include/MNN
and is installed as<MNN/>
Inferencing Improvements and Bug Fixes
- OpenCL BinaryOp Bugs
- CPU MatMul Bugs
- Added Unit Testing for over 30 ops
- ImageProcess now supports NV12 input and NV21 / NV12 stride
Training (Experimental Feature)
- Optimize
Express
Module's Dynamic Graph Execution Policy - Improve single machine training with a working demo
Op 补全
- 新增 TFlite 30 + op 支持
- 新增 Onnx 20 + op 支持
- 新增 Caffe 9 个 op 支持
- 新增 Tensorflow 24 个 op 支持
工程优化
- 完善CMake相关编译配置
- 头文件目录规范化,原 include 目录下的头文件移到 include/MNN 下
推理相关功能完善与Bug修复
- OpenCL BinaryOp 相关 Bug 修复
- CPU MatMul Bug 修复
- 完善单元测试,添加 30 + op 单元测试用例
- ImageProcess 支持 NV12 输入以及 NV21 / NV12 stride 支持
训练能力(整体仍处调试阶段)
- 优化 Express 模块动态图运行机制
- MNN 单机训练功能完善,Demo完成
0.2.1.5
0.2.1.5
integration
- add travis CI
- fix building parameters for python
converter
- add half storage option for MNN converter
- fix op name lost in converter
- fix converter bug for print input output, identity remove output
ops
- add quantized Convolution & Deconvolution support on OpenCL
- add more expression supports
- add DetectionPostProcess Op for TensorFlow Lite (ssd is supported directly now)
- add supports for LSTM & ELU for ONNX
- add support for Convolution that weights is not constant for ONNX
- fix Unary Op compile error on Linux
- fix Metal backend buffer reuse after resize
- fix Metal raw memory access after model releasing
- fix redundant transpose in Winograd generater
0.2.1.2
build
- unify schema building in core and converter;
- add more build script for android;
- add linux build script for python;
ops impl
- add floor mod support in binary;
- use eltwise impl in add/max/sub/mul binary for optimization;
- remove fake double support in cast;
- fix 5d support for concat;
- add adjX and adjY support for batch matmul;
- optimize conv2d back prop filter;
- add pad mode support for conv3d;
- fix bug in conv2d & conv depthwise with very small feature map;
- optimize binary without broacast;
- add data types support for gather;
- add gather ND support;
- use uint8 data type in gather v2;
- add transpose support for matmul;
- add matrix band part;
- add dim != 4 support for padding, reshape & tensor convert;
- add pad type support for pool3d;
- make ops based on TensorFlow Lite quantization optional;
- add all & any support for reduction;
- use type in parameter as output type in reduction;
- add int support for unary;
- add variable weight support for conv2d;
- fix conv2d depthwise weights initialization;
- fix type support for transpose;
- fix grad outputs count for reduce grad and reshape grad;
- fix priorbox & detection output;
- fix metal softmax error;
python
- add runSessionWithCallBackInfo interface;
- add max nodes limit (1400) for visualization tool;
- fix save error in python3;
- align default dim;
convert
- add extra design for optimization;
- add more post converting optimizers;
- add caffe v1 weights blob support;
- add cast, unary, conv transpose support for onnx model;
- optimize batchnorm, conv with variable weights, prelu, reshape, slice, upsample for onnx model;
- add cos/sin/atan/tan support for unary for tensorflow model;
- add any/all support for reduction for tensorflow model;
- add elu, conv3d, pool3d support for tensorflow model;
- optimize argmax, batchnorm, concat, batch to space, conv with variable weights, prelu, slice for tensorflow model;
others
- fix size computer lock;
- fix thread pool deadlock;
- add express & parameters in express;
- rewrite blitter chooser without static map;
- add tests for expr;
0.2.1.0
0.2.1.0
- dynamic computation graph (beta)
- add supports (/express)
- add tests
- add benchmarks with it (/benchmark/exprModels)
- Python
- MNN engine and tools were submitted to pip
- available on Windows/macOS/Linux
- Engine/Converter
- add supports for each op benchmarking
- refactor optimizer by separating steps
- CPU
- add supports for Conv3D, Pool3D, ELU, ReverseSequence
- fix ArgMax, Permute, Scale, BinaryOp, Slice, SliceTf
- OpenCL
- add half transform in CPU
- add broadcast supports for binary
- optimize Conv2D, Reshape, Eltwise, Gemm, etc.
- OpenGL
- add sub, real div supports for binary
- add supports for unary
- optimize Conv2D, Reshape
- Vulkan
- add max supports for eltwise
- Metal
- fix metallib missing problem
- Train/Quantization
- use express to refactor training codes
beta 0.2.0.9
beta 0.2.0.9
- fix quantization tool compiling on Windows
- fix converter compiling on Windows
- fix eltwise optimization on Windows
- separate sse & avx for Windows
- add LeakyReLU support for TensorFlow
- fix reshape, const for TensorFlow
- fix dimension format error for ONNX ops
- optimize winograd, ReLU for OpenCL
- add fp16 availability & dimensions size check-up for OpenCL
- optimize GEMM for arm32
- fix ExpandDims shape calculation when inputs size == 1
beta 0.2.0.8
beta 0.2.0.8
- add NaN check-up
- add quantification support for ScaleAdd Op
- add binary to eltwise optimization
- add console logs for quantization tool
- better document for quantization tool
- replace redundant dimension flags with dimension format
- optimize performance of TensorFlow Lite Quantized Convolution
- fix axis support for ONNX softmax converting
- fix getPerformance tool compiling error on Windows
beta 0.2.0.7
- move docs to http://www.yuque.com/mnn
- fix bugs for CPU ops TopKV2 and quantized convolution
- add enqueue map buffer error handle for OpenCL
- add nullptr protection for extra tensor desc
- add failure protection for memory acquirement
- fix slice shape calculation
- refactor binary shape calculation