Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold "FP16 weights -> Dequantize -> FP32 -> Conv/MatMul..." to "FP32 weights -> Conv/MatMul..." #35

Closed
zhenhuaw-me opened this issue Dec 15, 2020 · 7 comments · Fixed by #52
Labels
Enhancement New feature or request Quantization Story How we evolve

Comments

@zhenhuaw-me
Copy link
Owner

ONNX quantization doesn't take FP16 as a quantized data type, therefore nearly all FP16 quantized TFLite models are unsupported (see this FAQ).

We recommend user switch to full integer quantization. But if the user cannot do that (for example no TensorFlow model available), we can fold "FP16 weights -> Dequantize -> FP32 -> Conv/MatMul..." to "FP32 weights -> Conv/MatMul..." to workaround this issue.

@paulgavrikov
Copy link
Contributor

It would be awesome if you could implement this! Unfortunately, the mediapipe models often rely on fp16 weights.

@ram95014
Copy link

ram95014 commented Jan 13, 2021

I would second @paulgavrikov and request the fold from fp16 to fp32. Thanks. If that is not possible or would take a long time, it would be great if you can guide me to do this.

@zhenhuaw-me
Copy link
Owner Author

zhenhuaw-me commented Jan 13, 2021

@paulgavrikov @ram95014 Thanks for your feedback! I am very glad about that. This should be possible and should not take too much time but I was not working on it. It would be great if you can help!

In general, it could be divided into three steps:

  1. Parse and build the graph just like ONNX can support FP16 as a quantization type. In this stage, we will have many patterns like FP16 weights -> Dequantize -> FP32 Tensor -> Conv/MatMul which will become illegal if convert to ONNX directly. I think we can support this stage currently but I have not tried that locally.
  2. Walk the graph and fold FP16 weights. We can introduce a new pass like layout propagation or handling quantization - may be name like foldFP16Weights. Search the FP16 weights -> Dequantize -> FP32 Tensor -> Conv/MatMul in the graph (iterate the operators and check), for each pattern P:
    1. Cast the weights of the FP16 weights to FP32 directly, including changing Tensor.data, Tensor.dtype and etc.
    2. Detach the FP16 weights (which are actually FP32 weights now) and the Dequantize operator, and the FP32 Tensor and the Conv/MatMul operator.
    3. Attach the casted FP16 weights (which are actually FP32 weights now) to the Conv/MatMul operator.
    4. Remove the Dequantize and FP32 Tensor from the graph.
  3. Recollect the tensors and operator like other graph operator passes.

I would suggest starting by adding a new operator to tflite2onnx to understand the code. They are pretty easy, you may find some in merged PRs.

It would be great if we can bring it up in the next minor release. Let's prioritize it!

@zhenhuaw-me zhenhuaw-me linked a pull request Jan 16, 2021 that will close this issue
@zhenhuaw-me
Copy link
Owner Author

@paulgavrikov @ram95014 This functionality has been enabled, please try out with the latest code.
If anything looks wrong, please open issues and link to this one. Thanks!

@mikkelmedm
Copy link

Hi, should this be fixed? I am still having issues

@zhenhuaw-me
Copy link
Owner Author

@mikkelmedm It has been fixed and protected by this test. What's the error you have?

@mikkelmedm
Copy link

Running it on a MediaPipe model I am getting an error like "FP16 is not tested, and might not work properly"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Quantization Story How we evolve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants