-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUANTIZE] Memorizing the quantize node mapping #3233
Conversation
Fusion of resnet-50 is broken after this pr.
If simulated_quantize is not memorized, there will be two simulated_quantize following the first conv2d.
CSE will merge both i8 and stop_fusion, but not i32 cast:
In this way we ensure that the output is int8. However, with simulated_quantize memorized, the two i32 cast will be merged because memorization (we call |
@vinx13 would be great if we can find alternates. Ideally, we want to deal with both cases. |
@vinx13 I may not get the point. Does it matter to merge those two i32 cast? The main residual block still will output 8bit (in front of the |
@ZihengJiang It will impact the performance. Although Stop_fusion can make sure that conv2d + fused ops produce int8 result, if the int32 casts are merged, it will be put into a separate sub function |
@ZihengJiang @vinx13 can you please followup now that #3280 is merged? |
@ZihengJiang please rebase against master |
* [QUANTIZE] Support for clip operator * [QUANTIZE] Memorizing the quantize node mapping. * [QUANTIZE] Remove use_stop_fusion and skip_k_conv in qconfig * update * update * update * update
* [QUANTIZE] Support for clip operator * [QUANTIZE] Memorizing the quantize node mapping. * [QUANTIZE] Remove use_stop_fusion and skip_k_conv in qconfig * update * update * update * update
To avoid duplicated simulated quantize.
@tqchen @vinx13 @jwfromm