Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUANTIZE] Memorizing the quantize node mapping #3233

Merged
merged 9 commits into from
Jun 22, 2019

Conversation

ZihengJiang
Copy link
Contributor

To avoid duplicated simulated quantize.

@tqchen @vinx13 @jwfromm

@ZihengJiang
Copy link
Contributor Author

Also, let's remove skip_k_conv since it can be expressed by skip_conv_layers, and remove use_stop_fusion since it is mainly for store_lowbit_output. @vinx13 @jwfromm

@vinx13
Copy link
Member

vinx13 commented May 23, 2019

Fusion of resnet-50 is broken after this pr.
Previously it is ensured that every subfunction after has int8 output type. This need special handling of residual block

conv - add(bn) - conv
    |---add ------|

If simulated_quantize is not memorized, there will be two simulated_quantize following the first conv2d.
After realize, we have:

conv - i8 - stop_fusion - i32 - add(bn) ... conv
     -i8  - stop_fusion - i32 -----------add |

CSE will merge both i8 and stop_fusion, but not i32 cast:

conv - i8 - stop_fusion - i32 - add(bn) ... conv
                   |-----i32 -------------add |

In this way we ensure that the output is int8.

However, with simulated_quantize memorized, the two i32 cast will be merged because memorization (we call ExprMutator::Mutate with the same simulated_quantize) as a result subfunctions will have i32 output type.
Maybe we can do something with multiref trigger in realize?

@tqchen
Copy link
Member

tqchen commented May 24, 2019

@vinx13 would be great if we can find alternates. Ideally, we want to deal with both cases.
It would also be helpful if you can elaborate it a bit, I did not quite get your example.

@ZihengJiang
Copy link
Contributor Author

ZihengJiang commented May 30, 2019

@vinx13 I may not get the point. Does it matter to merge those two i32 cast? The main residual block still will output 8bit (in front of the stop_fusion). Will it influent the accuracy or just performance?

@vinx13
Copy link
Member

vinx13 commented May 31, 2019

@ZihengJiang It will impact the performance. Although Stop_fusion can make sure that conv2d + fused ops produce int8 result, if the int32 casts are merged, it will be put into a separate sub function

@tqchen
Copy link
Member

tqchen commented Jun 19, 2019

@ZihengJiang @vinx13 can you please followup now that #3280 is merged?

@vinx13
Copy link
Member

vinx13 commented Jun 20, 2019

@ZihengJiang please rebase against master

@ZihengJiang ZihengJiang merged commit bfb4884 into apache:master Jun 22, 2019
@ZihengJiang ZihengJiang deleted the quantize branch June 22, 2019 21:59
wweic pushed a commit to wweic/tvm that referenced this pull request Jun 26, 2019
* [QUANTIZE] Support for clip operator

* [QUANTIZE] Memorizing the quantize node mapping.

* [QUANTIZE] Remove use_stop_fusion and skip_k_conv in qconfig

* update

* update

* update

* update
wweic pushed a commit to neo-ai/tvm that referenced this pull request Jun 27, 2019
* [QUANTIZE] Support for clip operator

* [QUANTIZE] Memorizing the quantize node mapping.

* [QUANTIZE] Remove use_stop_fusion and skip_k_conv in qconfig

* update

* update

* update

* update
@ZihengJiang ZihengJiang restored the quantize branch July 19, 2019 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants