Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resize Nearest Neighbor does not convert #187

Closed
rajansaini691 opened this issue Jul 29, 2020 · 16 comments
Closed

Resize Nearest Neighbor does not convert #187

rajansaini691 opened this issue Jul 29, 2020 · 16 comments

Comments

@rajansaini691
Copy link

Hello again! I am attempting to adapt tiny yolo v3 to the edge tpu. My model quantizes and compiles, but it has inference times of ~1.5s. According to the logs, the RESIZE_NEAREST_NEIGHBOR op is not mapped to the edge device, which may account for the slowdown. What should I do to make the mapping successful and improve performance?

I am also using the newest compiler version with tensorflow 2.3.

model
compiled model
logs

@Namburger
Copy link

@rajansaini691 Thanks for reaching out!
We are actually in the experimentation stage with the yolo models also so I don't know enough to give you too much details. However there are some experimental features that I can use to allow more ops to be mapped to the tpu (not released).
Here is the compiled model with 22/25 ops mapped to the tpu it should be fast enough for real time (this is what I can offer at this time):
tiny_yolo_edgetpu.tflite.tar.gz
tiny_yolo_edgetpu.log

I'm also interested on your approach to this model, would be awesome if you could share :)

@Namburger
Copy link

@rajansaini691
FYI, the resize nearest neighbor for yolo is 4 dimensions, currently our compiler can only support 3 :(
https://coral.ai/docs/edgetpu/models-intro/#supported-operations

@rajansaini691
Copy link
Author

@Namburger You'll find this very interesting. It turns out that the edgetpu compiler fully compiles the tflite model here, which goes on to run in 27 ms! I then followed their instructions to create a retrained model for our dataset. It was a very convoluted process, but I ended up with a tflite model that was nearly identical node-for-node. The only difference was that my ResizeNearestNeighbor op has half_pixel_centers set to true, while theirs is set to false. I'm not sure what this does, other than that theirs compiled while mine didn't, causing a butterfly effect and a massive performance gap.

I am not sure why this is the case. I am guessing they used an older version of the TOCO converter, but my attempts at reverse-engineering their dev environment were not very fruitful. Anyway, if either of us can figure out how to set half_pixel_centers to false, or if your team can support it being set to true, we'll be able to get tiny yolo v3 running insanely quickly and reliably.

Also, your modifications really helped. Unfortunately we are still in the process of retraining and validating our model, so I may need to send you another file in a week or so :)

@Namburger
Copy link

@rajansaini691 I see, I have also been following that same guide, but have not been able to produce the same model :(
I believe it must be related to the tensorflow version changes and how the tflite model were produced, which all comes down to experimenting at this stage.

Also, your modifications really helped. Unfortunately we are still in the process of retraining and validating our model, so I may need to send you another file in a week or so :)

Totally, keep posting the model here, I'll help you compile it!

@ZhouKai90
Copy link

@rajansaini691
FYI, the resize nearest neighbor for yolo is 4 dimensions, currently our compiler can only support 3 :(
https://coral.ai/docs/edgetpu/models-intro/#supported-operations

But Why TF2.1 can support 4-D, I am confused.

@Namburger
Copy link

Namburger commented Aug 5, 2020

@ZhouKai90 do you mean tflite converter?
Because that's a separate process from the compiler, here's the current doc:

Input/output is a 3-dimensional tensor. Depending on input/output size, this operation might not be mapped to the Edge TPU to avoid loss in precision

If you want me to compile your model with some experimental features, you can post here, I'll gladly compile it for you!

@ZhouKai90
Copy link

@ZhouKai90 do you mean tflite converter?
Because that's a separate process from the compiler, here's the current doc:

Input/output is a 3-dimensional tensor. Depending on input/output size, this operation might not be mapped to the Edge TPU to avoid loss in precision

If you want me to compile your model with some experimental features, you can post here, I'll gladly compile it for you!

I means if I install TF2.3 and use python script 'tf.lite.TFLiteConverter.from_saved_model()' to convert savedmodel to xxx.tflite. I got the "RESIZE_NEAREST_NEIGHBOR Operation version not supported" error info when i used edgetpu_compiler to compile the xxx.tflite . But if I downgraded the TF2.3 to TF2.1,the convert and compile worked perfectly without any error.

@ZhouKai90
Copy link

@ZhouKai90 do you mean tflite converter?
Because that's a separate process from the compiler, here's the current doc:
Input/output is a 3-dimensional tensor. Depending on input/output size, this operation might not be mapped to the Edge TPU to avoid loss in precision
If you want me to compile your model with some experimental features, you can post here, I'll gladly compile it for you!

I means if I install TF2.3 and use python script 'tf.lite.TFLiteConverter.from_saved_model()' to convert savedmodel to xxx.tflite. I got the "RESIZE_NEAREST_NEIGHBOR Operation version not supported" error info when i used edgetpu_compiler to compile the xxx.tflite . But if I downgraded the TF2.3 to TF2.1,the convert and compile worked perfectly without any error.

In addition, I noticed the converter between TF2.3 and TF2.1. TF2.1 is TOCO, but TF2.3 is MLIR.

@rajansaini691
Copy link
Author

Hi @Namburger thanks for your help. Would it be alright if you could convert this model with the new compiler? Also, if it isn't too much trouble, would it be alright if you could post its inference time? I want to make sure that my environment maximizes the detection speed.

tiny_yolo_test.zip

@Namburger
Copy link

Here it is: tiny_yolo_test.tar.gz

@rajansaini691
Copy link
Author

Hi @Namburger , would it be alright if you could compile this model as well?
tiny_yolo.zip

Also, I'm experimenting with different quantization strategies, so I may end up sending you a few (not more than 1 or 2 per day) in the near future. Is there anything I can do to make the process more streamlined for you? Or perhaps compile it myself, if the team's okay with that?

@Namburger
Copy link

tiny_yolo.tar.gz
@rajansaini691 how about send me an email?
[email protected]

@ravinemo
Copy link

@rajansaini691 Where you able to fix this issue. Our model also takes around 2 secs to inference.

@ricardodeazambuja
Copy link

Any news on this issue?

I'm using Edge TPU Compiler version 15.0.340273435 (tf.version =>2.4.0) and it still gives me the message "Operation version not supported" if I try to use RESIZE_NEAREST_NEIGHBOR and RESIZE_BILINEAR.

@YsYusaito
Copy link

YsYusaito commented Mar 30, 2021

Hi @Namburger , @ZhouKai90
In my case, even if I used TF 2.0.0, 2.1.0 or 2.2.0, the converter was MLIR.
Apparently, default converter is MLIR.
In order to use TOCO converter, we have to write 「converter.experimental_new_converter = False」
before converting models from .h5 to .tflite(tflite_model = converter.convert()). (Please reference link below.)
tensorflow/tensorflow#43247

I solved the RESIZE_NEAREST_NEIGHBOR problem by 「correcting gen_image_ops.py」.
・・・・・・\Anaconda3\envs\yolov3_3\Lib\site-packages\tensorflow\python\ops\gen_image_ops.py

Correcting points are below.
half_pixel_centers=half_pixel_centers → half_pixel_centers=False
half_pixel_centers = _execute.make_bool(half_pixel_centers, "half_pixel_centers") → half_pixel_centers=False

As a side note, I eventually transformed the model under TF 2.2.0 condition and I was able to map all the inference proccessing
to Edge TPU without any problems.

@jveitchmichaelis
Copy link

jveitchmichaelis commented Aug 19, 2021

This is still an issue with current versions, but one possible issue is input sizes. I have a UNet model that fails to map fully with a 256x256 input, but works fine with a 128x128.

For reference, I tried exporting my model with Tensorflow 2.2 and compilation just fails without any reason:

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.
Compilation child process completed within timeout period.
Compilation failed!

Here's the output from v15 and TF 2.5.0, note none of the layers are mapped and also the log message is basically false? (the op should be supported regardless of runtime version)

Edge TPU Compiler version 15.0.340273435

Model compiled successfully in 12614 ms.

Input model: unet_quant.tflite
Input size: 7.51MiB
Output model: unet_quant_edgetpu.tflite
Output size: 10.75MiB
On-chip memory used for caching model parameters: 5.73MiB
On-chip memory remaining for caching model parameters: 1.27MiB
Off-chip memory used for streaming uncached model parameters: 1.73MiB
Number of Edge TPU subgraphs: 4
Total number of operations: 26
Operation log: unet_quant_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 23
Number of operations that will run on CPU: 3

Operator                       Count      Status

MAX_POOL_2D                    3          Mapped to Edge TPU
CONV_2D                        15         Mapped to Edge TPU
CONCATENATION                  3          Mapped to Edge TPU
QUANTIZE                       2          Mapped to Edge TPU
RESIZE_BILINEAR                3          Operation version not supported

And now 16 - which at least manages to map one of the layers and gives a cryptic error about the other 2:

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 6286 ms.

Input model: unet_quant.tflite
Input size: 7.51MiB
Output model: unet_quant_edgetpu.tflite
Output size: 10.22MiB
On-chip memory used for caching model parameters: 5.20MiB
On-chip memory remaining for caching model parameters: 257.50KiB
Off-chip memory used for streaming uncached model parameters: 2.26MiB
Number of Edge TPU subgraphs: 3
Total number of operations: 26
Operation log: unet_quant_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 24
Number of operations that will run on CPU: 2

Operator                       Count      Status

RESIZE_BILINEAR                1          Mapped to Edge TPU
RESIZE_BILINEAR                2          Operation is otherwise supported, but not mapped due to some unspecified limitation
MAX_POOL_2D                    3          Mapped to Edge TPU
QUANTIZE                       2          Mapped to Edge TPU
CONCATENATION                  3          Mapped to Edge TPU
CONV_2D                        15         Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded!

As mentioned above, if I change the input size to be smaller then everything gets mapped. The compile time is also quite a lot faster.

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 1209 ms.

Input model: unet_quant.tflite
Input size: 7.51MiB
Output model: unet_quant_edgetpu.tflite
Output size: 7.59MiB
On-chip memory used for caching model parameters: 5.76MiB
On-chip memory remaining for caching model parameters: 17.50KiB
Off-chip memory used for streaming uncached model parameters: 1.70MiB
Number of Edge TPU subgraphs: 1
Total number of operations: 26
Operation log: unet_quant_edgetpu.log

Operator                       Count      Status

MAX_POOL_2D                    3          Mapped to Edge TPU
CONV_2D                        15         Mapped to Edge TPU
CONCATENATION                  3          Mapped to Edge TPU
QUANTIZE                       2          Mapped to Edge TPU
RESIZE_BILINEAR                3          Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded!

@Namburger perhaps the compiler is failing to recognise situations where the input size is too large? Or is this symptomatic of another issue with the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants