[Relay][Quantization] Speed-aware quantization scheme improvement #2723

vinx13 · 2019-03-04T07:45:45Z

Writing int32 result to global memory can be much slower than int8. This PR does the following change:

in add_rewrite, quantize rhs to int8 so that read/write of rhs can be performed in int8.
In UnifyDtypeScale, if the input is simulated_quantize(QInput), cast the input to int8 before casting to int32.

python/tvm/relay/quantize/_annotate.py

src/relay/pass/quantize.cc

ZihengJiang · 2019-03-07T00:06:56Z

for the comment, I mean to explain the code like here

solved

ZihengJiang · 2019-03-08T17:54:27Z

please fixed the CI @vinx13

ZihengJiang · 2019-03-09T15:58:03Z

Merged, thanks! @vinx13

…ache#2723) * [Relay][Quantization] Speed-aware quantization scheme improvement * Add comment * Add use_stop_fusion to qconfig * Update comment

vinx13 force-pushed the feature/quanti_improve branch from 865b37c to 4996286 Compare March 4, 2019 08:11

ZihengJiang self-assigned this Mar 5, 2019

ZihengJiang reviewed Mar 5, 2019

View reviewed changes

python/tvm/relay/quantize/_annotate.py Show resolved Hide resolved

src/relay/pass/quantize.cc Show resolved Hide resolved

tqchen previously requested changes Mar 5, 2019

View reviewed changes

src/relay/pass/quantize.cc Show resolved Hide resolved

ZihengJiang approved these changes Mar 8, 2019

View reviewed changes

vinx13 added 4 commits March 9, 2019 12:45

[Relay][Quantization] Speed-aware quantization scheme improvement

412b775

Add comment

4c72fba

Add use_stop_fusion to qconfig

dc58c2c

Update comment

59474d2

vinx13 force-pushed the feature/quanti_improve branch from 807b5cd to 59474d2 Compare March 9, 2019 04:46

ZihengJiang approved these changes Mar 9, 2019

View reviewed changes

ZihengJiang merged commit 21e8dfa into apache:master Mar 9, 2019

ZihengJiang added the status: accepted label Mar 9, 2019

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback