Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arcface model is invalid #91

Closed
snnn opened this issue Aug 30, 2018 · 49 comments
Closed

arcface model is invalid #91

snnn opened this issue Aug 30, 2018 · 49 comments

Comments

@snnn
Copy link
Contributor

snnn commented Aug 30, 2018

I downloaded the model from:
https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx

If you open the model, take a look at the second OP: Sub. Its first input, A, is a float tensor, but its second input, B, is a double tensor.

@ankkhedia
Copy link
Contributor

Hi @snnn, I tried viewing the above model using Netron and for me the second input B for Sub operator shows up as float tensor in Netron.
screen shot 2018-08-30 at 2 21 22 pm

@snnn
Copy link
Contributor Author

snnn commented Aug 31, 2018

Hi @ankkhedia

It's float64 ?

@prasanthpul
Copy link
Member

@snnn is the problem that the type needs to be the same for both?

@snnn
Copy link
Contributor Author

snnn commented Sep 11, 2018

Yes.

@prasanthpul
Copy link
Member

@ankkhedia can you fix the model?

@ankkhedia
Copy link
Contributor

@prasanthpul I will take a look.

@prasanthpul
Copy link
Member

@ankkhedia any update on this?

@ankkhedia
Copy link
Contributor

Hi @prasanthpul Sorry for being late as got pulled into some other things. I will try to prioritise it this week.

@ankkhedia
Copy link
Contributor

@prasanthpul @snnn It seems to be error in MXNet-ONNX converter. I have raised an issue with the team apache/mxnet#13044
I will convert and put back new model here when the issue gets fixed.

@linkerzhang
Copy link
Member

This is not good. We'd remove these models if they're invalid. We can add them back after fixing those issues.

@snnn are there more model issues you saw please? Thank you very much for bringing this up!

@linkerzhang
Copy link
Member

@ankkhedia

@snnn
Copy link
Contributor Author

snnn commented Nov 6, 2018

In addition to Arcface, there are also problems in:

  • Resnet18v1
  • Resnet34v1
  • Resnet50v1
  • Resnet101v1
  • Resnet152v1
  • vgg16
  • Vgg16_bn
  • Vgg19
  • Vgg19_bn

@snnn
Copy link
Contributor Author

snnn commented Nov 10, 2018

@ankkhedia Any update? Could you please confirm if these models have problems?

Thanks

@ankkhedia
Copy link
Contributor

I will check other models. However, Arcface issue has been fixed and I will update the new model.

@ankkhedia
Copy link
Contributor

Hi @snnn Could you please point to the problems with the above models you listed so that I can take a look.

@snnn
Copy link
Contributor Author

snnn commented Nov 14, 2018

The inputs to GEMM operator, are not 2D tensors. They have more than 2 dimensions.

@ankkhedia
Copy link
Contributor

@snnn This has been discussed in this issue before. #90.
I think there was no good support for GEMM in ONNX when these models were created. ONNX do have some missing operator and are usually mapped to the closest operator in the source framework.

As far as I know, support for GEMM in ONNX-MXNet is either work in progress or has been done. I will post new model if the support has been added.

@snnn
Copy link
Contributor Author

snnn commented Nov 14, 2018

Hi @ankkhedia , do you have an estimated time of completion?

@ankkhedia
Copy link
Contributor

@snnn I will have to check with ONNX-MXNet converter team to be able to give a clear ETA.
I will update you on the same.
If the support has not been added, then it depends upon their roadmap on when the support will be complete. The team is working actively to get rigorous operator coverage.

@prasanthpul
Copy link
Member

@ankkhedia I think your last comment is about the other models. can you confirm whether arcface model has been fixed? Will you be posting a 1.3 version as well?

@snnn
Copy link
Contributor Author

snnn commented Nov 14, 2018

The issue was already there 3 months, but we still don't know when it can be fixed?
From user experience perspective, ONNX user would think ONNX model zoo is low quality. I suggest we either fix it quickly, or delete the malformed models.

@prasanthpul
Copy link
Member

@snnn lets create separate issue for the other models. this issue is only for arcface.
for the other models, I agree that if we cannot fix them they should be removed for now.

@ankkhedia
Copy link
Contributor

ankkhedia commented Nov 14, 2018

@snnn @prasanthpul The model has been fixed and updated in the S3. I checked the model structure with Netron and float64 issue is not there anymore.

@prasanthpul
Copy link
Member

Thanks @ankkhedia. Looks like only 1.2 (opset7) version is posted. will you be posting 1.3 as well?

@snnn
Copy link
Contributor Author

snnn commented Nov 14, 2018

Hi @ankkhedia , could please verify it?
I got the model from:
'https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.tar.gz'

It is still wrong.

@ankkhedia
Copy link
Contributor

ankkhedia commented Nov 14, 2018

@snnn Sorry for the miss. I uploaded renet100.onnx file. I will change this tar too.

@ankkhedia
Copy link
Contributor

@snnn added the latest tar file.

@ryanlai2
Copy link

ryanlai2 commented Nov 15, 2018

Can we fix ArcFace's README.md so that the table to download the model is correct? The download link was changed to download an OpSet8 model.

Currently, there is only one download link for ArcFace and it's labeled as OpSet 7, v1.2.1. However, the link downloads an OpSet8 v1.3 version of the model.
https://github.com/onnx/models/tree/master/models/face_recognition/ArcFace

image

@ankkhedia
Copy link
Contributor

updated :)

@snnn
Copy link
Contributor Author

snnn commented Nov 15, 2018

Hi @ankkhedia , the old issue is fixed, but we get new one.
For the "relu0" node, its inputs has shape of [1, 64, 112, 112] and [64]. There is no broadcast rule can be applied on them.

@snnn
Copy link
Contributor Author

snnn commented Nov 16, 2018

Hi @ankkhedia , Could you verify issue?

Thanks.

@Roshrini
Copy link

Hi @snnn, I verified this issue on my end. We are actively working on both Prelu and Gemm issue mentioned and re-upload the models as early as we can. Thanks for reporting this and sorry for the inconvenience it has caused.

@ankkhedia
Copy link
Contributor

Hi @snnn There are open PR to fix the above issues with Prelu and GEMM.
I have generated a model after including those fixes https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100_new.onnx
Could you please let me know if this model looks good to you.

We will update the model once the PR are merged.

@snnn
Copy link
Contributor Author

snnn commented Nov 30, 2018

Hi @ankkhedia , thank you for fixing it. I'm having a vacation, with poor internet connection. I'll ask my colleague for help.

@snnn
Copy link
Contributor Author

snnn commented Dec 11, 2018

The problem is solved. Thanks!

@snnn
Copy link
Contributor Author

snnn commented Feb 13, 2019

@ankkhedia
Copy link
Contributor

ankkhedia commented Feb 13, 2019

@snnn I have updated the model in https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx

Could you please verify?

@snnn
Copy link
Contributor Author

snnn commented Feb 13, 2019

@ankkhedia
Copy link
Contributor

My bad. Updating the same

@snnn
Copy link
Contributor Author

snnn commented Feb 13, 2019

@ankkhedia
Copy link
Contributor

@snnn
uploaded resnet100.tar.gz and resnet100-md5.txt now.

@snnn
Copy link
Contributor Author

snnn commented Feb 13, 2019

Perfect. Thanks!

@snnn snnn closed this as completed Feb 13, 2019
@XinyuDu
Copy link

XinyuDu commented Feb 19, 2019

@ankkhedia Hi, How can I convert the arcface mxnet model to onnx model without the float64 error? THX!

@luan1412167
Copy link

luan1412167 commented Oct 8, 2019

@snnn @ankkhedia I get the error. It may be same as your error. Maybe it as
#91 (comment)
Have Any your experiment help me? Thanks
2019-10-08 11:49:13.612837502 [E:onnxruntime:, sequential_executor.cc:165 Execute] Non-zero status code returned while running PRelu node. Name:'relu0' Status Message: /home/luandd/project_company/face_rec/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:329 void onnxruntime::BroadcastIterator::Init(int64_t, int64_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 64 by 112

@luan1412167
Copy link

@snnn @ankkhedia have you right model with spatial=1?

@sky186
Copy link

sky186 commented Dec 16, 2019

@ankkhedia
hello , arcface mxnet to onnx canbe fixed?
how to convert onnx ,is right?
the prelu out not right? because Iwant to convert caffe,but the onnx can be export but is not right?

@sky186
Copy link

sky186 commented Dec 16, 2019

@luan1412167
hi, now youcan convert mxnet arcface to onnx right ? I fix ,but export model prelu out not right,not to equal mxnet,could you tell me how to convert onnx right ?

@HoangTienDuc
Copy link

Hi @ankkhedia , the old issue is fixed, but we get new one.
For the "relu0" node, its inputs has shape of [1, 64, 112, 112] and [64]. There is no broadcast rule can be applied on them.

hi @ankkhedia @snnn i also try to convert arcface LResNet100E-IR mxnet to onnx by using convert_onnx.py. Then, it seem that, i got the same error with @snnn when i deploy my model.

onnx runtime error 1: Non-zero status code returned while running PRelu node. Name:'relu0' Status Message: relu0: right operand cannot broadcast on dim 0 LeftShape: {1,64,112,112}, RightShape: {64}

Can you guide me how to fix it?
Thank all off u.

@snnn
Copy link
Contributor Author

snnn commented Mar 18, 2020

see apache/mxnet#17711

@vinitra is fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants