Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXNet to ONNX export bug #14875

Closed
ehsanmok opened this issue May 3, 2019 · 29 comments · Fixed by #14942
Closed

MXNet to ONNX export bug #14875

ehsanmok opened this issue May 3, 2019 · 29 comments · Fixed by #14942

Comments

@ehsanmok
Copy link
Contributor

ehsanmok commented May 3, 2019

When trying to convert yolov3_mobilenetv1.0_coco pretrained from gluonCV v0.5 to ONNX via onnx_mxnet.export_model (using mxnet-cu90mkl==1.5.0b20190313 ), I get the following error:

/anaconda3/lib/python3.7/site-packages/mxnet/contrib/onnx/mx2onnx/_op_translations.py in convert_slice_axis(node, **kwargs)
   1320     axes = int(attrs.get("axis"))
   1321     starts = int(attrs.get("begin"))
-> 1322     ends = int(attrs.get("end", None))
   1323     if not ends:
   1324         raise ValueError("Slice: ONNX doesnt't support 'None' in 'end' attribute")

ValueError: invalid literal for int() with base 10: 'None'

This matches this line which is a bug introduced in #12878 .

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: ONNX, Bug

@lanking520
Copy link
Member

@Roshrini Could you please take a look?

@lanking520 lanking520 added the ONNX label May 5, 2019
@AnaRhisT94
Copy link

@ehsanmok mxnet isn't responsible for this error, it's only ONNX.
update your ONNX version, should work when ONNX==1.2.2

@ehsanmok
Copy link
Contributor Author

ehsanmok commented May 6, 2019

@AnaRhisT94 No, my ONNX is already the latest v1.5.0. It's when calling export_model causes that to happen. int(None) is never valid.

@AnaRhisT94
Copy link

Try to use ONNX 1.2.2

@ehsanmok
Copy link
Contributor Author

ehsanmok commented May 6, 2019

Same error with ONNX 1.2.2

@vandanavk
Copy link
Contributor

vandanavk commented May 13, 2019

@mxnet-label-bot add [Bug]

@ehsanmok I'm looking into this, could you share your script?

@AnaRhisT94
Copy link

Same error with ONNX 1.2.2

I see, well just delete the None then?

@ehsanmok
Copy link
Contributor Author

@vandanavk here is an MVE

from os import path as osp
import numpy as np
import mxnet as mx
from mxnet.contrib import onnx as onnx_mxnet
from mxnet import gluon
from gluoncv import model_zoo, data, utils

OUTPUT = "./output"
DATA = "./data/cat.png"
SIZE = 320
MODEL = "yolo3_mobilenet1.0_coco"
INPUT_SHAPE = (1, 3, SIZE, SIZE)

net = model_zoo.get_model(MODEL, pretrained=True)
net.hybridize()
# pass an img to trigger init after hybridize
x, _ = data.transforms.presets.yolo.load_test(DATA, short=SIZE)
_, _ = net(x)

net.export(osp.join(OUTPUT, MODEL))
sym = osp.join(OUTPUT, MODEL + "-symbol.json")
params = osp.join(OUTPUT, MODEL + "-0000.params")
onnx_file = osp.join(OUTPUT, MODEL + ".onnx")

converted_model_path = onnx_mxnet.export_model(sym, params, [INPUT_SHAPE], np.float32, onnx_file, verbose=True)

@vandanavk
Copy link
Contributor

@ehsanmok I tried the following code with the PR #14942. ValueError: invalid literal for int() with base 10: 'None' error doesn't occur anymore but I do see AttributeError: No conversion function registered for op type _arange yet.. _arange export can be filed as separate feature request. Please try PR #14942 and let me know if it works for you.

from os import path as osp
import numpy as np
import mxnet as mx
from mxnet.contrib import onnx as onnx_mxnet
from mxnet import gluon
from gluoncv import model_zoo, data, utils

OUTPUT = "./"
DATA = "./cat.jpg"
SIZE = 320
MODEL = "yolo3_darknet53_coco"
INPUT_SHAPE = (1, 3, SIZE, SIZE)

net = model_zoo.get_model(MODEL, pretrained=True)
net.hybridize()
# pass an img to trigger init after hybridize
x, _ = data.transforms.presets.yolo.load_test(DATA, short=SIZE)
_ = net(x)

net.export(osp.join(OUTPUT, MODEL))
sym = osp.join(OUTPUT, MODEL + "-symbol.json")
params = osp.join(OUTPUT, MODEL + "-0000.params")
onnx_file = osp.join(OUTPUT, MODEL + ".onnx")

converted_model_path = onnx_mxnet.export_model(sym, params, [INPUT_SHAPE], np.float32, onnx_file, verbose=True)

@bloatybo
Copy link

@ehsanmok I met the same problem with you. I saw the issue has been around for 3 months. Did you solve it?

@vandanavk
Copy link
Contributor

@ehsanmok I met the same problem with you. I saw the issue has been around for 3 months. Did you solve it?

Can you try the PR #14942? I dint see the issue with this PR.

@caiqi
Copy link

caiqi commented Aug 18, 2019

I found that there are several ops not supported during converting, including slice_axis(..., end =None), slice_like, repeat, arange. But for a fixed input dimension, these operations can be replaced with normal slice_like, concat. The main problem is in box_nms.

@ntomer
Copy link

ntomer commented Aug 23, 2019

Not sure if this is the right place to post, but I used the fixes from PR #14942, fixed the issue for me but the next error is:
'AttributeError: No conversion function registered for op type _greater_scalar yet.'

Attempting to export 'ssd_512_mobilenet1.0_voc'

@vandanavk
Copy link
Contributor

@caiqi @ntomer feel free to contribute the ONNX conversion for these missing operators 👍

@djaym7
Copy link

djaym7 commented Nov 5, 2019

Does anyone have any update on this ? I am having the same issue ...

@ghost
Copy link

ghost commented Dec 3, 2019

the bug is happen in yolo3.py line 161, there is a None param, fix it. and then u will meet the _arange op not regist

@Rainweic
Copy link

Not sure if this is the right place to post, but I used the fixes from PR #14942, fixed the issue for me but the next error is:
'AttributeError: No conversion function registered for op type _greater_scalar yet.'

Attempting to export 'ssd_512_mobilenet1.0_voc'

Did you find some ways to fix it? I meet it with "ssd_512_resnet50_v1_voc"

@djaym7
Copy link

djaym7 commented Dec 11, 2019

nope, raised a ticket in Amazon but no one is currently working on this ..

@mahxn0
Copy link

mahxn0 commented Jan 16, 2020

same problom, when used torch yolov32onnx.py, so easy to convert
I will give up mxnet never look back

@chouxianyu
Copy link

chouxianyu commented Jan 31, 2020

I met the same problem.
And I tried the solution in PR#14942, found a new bug.

File "D:\WorkingSoftware\Anaconda3\lib\site-packages\mxnet\contrib\onnx\mx2onnx\export_model.py", line 83, in export_model
verbose=verbose)
File "D:\WorkingSoftware\Anaconda3\lib\site-packages\mxnet\contrib\onnx\mx2onnx\export_onnx.py", line 253, in create_onnx_graph_proto
idx=idx
File "D:\WorkingSoftware\Anaconda3\lib\site-packages\mxnet\contrib\onnx\mx2onnx\export_onnx.py", line 90, in convert_layer
raise AttributeError("No conversion function registered for op type %s yet." % op)
AttributeError: No conversion function registered for op type _arange yet.

@djaym7
Copy link

djaym7 commented Feb 4, 2020

I met the same problem.
And I tried the solution in PR#14942, found a new bug.

File "D:\WorkingSoftware\Anaconda3\lib\site-packages\mxnet\contrib\onnx\mx2onnx\export_model.py", line 83, in export_model
verbose=verbose)
File "D:\WorkingSoftware\Anaconda3\lib\site-packages\mxnet\contrib\onnx\mx2onnx\export_onnx.py", line 253, in create_onnx_graph_proto
idx=idx
File "D:\WorkingSoftware\Anaconda3\lib\site-packages\mxnet\contrib\onnx\mx2onnx\export_onnx.py", line 90, in convert_layer
raise AttributeError("No conversion function registered for op type %s yet." % op)
AttributeError: No conversion function registered for op type _arange yet.

Had same error on Nov 5, 2019.. tried to build make the operator but didnt work..

@LewsTherin511
Copy link

LewsTherin511 commented Feb 12, 2021

I'm encountering the same issue.
I fine-tuned an SSD model on a custom dataset (everything working properly), and I'm trying to export it to ONNX in order to run it on Android.
This is what I'm doing:

from os import path as osp
import numpy as np
import mxnet as mx
import gluoncv as gcv
from mxnet.contrib import onnx as onnx_mxnet
from mxnet import gluon
from gluoncv import model_zoo, data, utils

ctx = mx.cpu(0)

OUTPUT = 'oxnn/'
DATA = "./friends.png"
MODEL = "CML_exported"
INPUT_SHAPE = ((1,3,512,683))

dummy_img, _ = data.transforms.presets.ssd.load_test(DATA, short=512)

CML_classes = ["CML_mug"]
net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_custom', classes=CML_classes, pretrained_base=False, ctx=ctx)
net.load_parameters("saved_weights/CML_mobilenet_mug_00/ep_035.params", ctx=ctx)
net.hybridize()
_ = net(dummy_img)

net.export(osp.join(OUTPUT, MODEL))
sym = osp.join(OUTPUT, MODEL + "-symbol.json")
params = osp.join(OUTPUT, MODEL + "-0000.params")
onnx_file = osp.join(OUTPUT, MODEL + ".onnx")

converted_model_path = onnx_mxnet.export_model(sym, params, [INPUT_SHAPE], np.float32, onnx_file, verbose=True)

I'm getting the usual:

  File "/home/lews/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/contrib/onnx/mx2onnx/_op_translations.py", line 1502, in convert_slice_axis
    ends = int(attrs.get("end", None))
ValueError: invalid literal for int() with base 10: 'None'```

Any updates since last year? Was this somehow fixed?

@Zha0q1
Copy link
Contributor

Zha0q1 commented Feb 17, 2021

Hi @LewsTherin511, thanks for reaching out! This should be easy to fix; you can expect this to be fixed by tomorrow :)

Meanwhile what version of mxnet are you using and on what os? Our team have been improving onnx lately (on v1.x branch) and here is a simple tool to help update onnx support to your local mxnet #19876
We would be very happy to help you export the model and answer any questions

@Zha0q1
Copy link
Contributor

Zha0q1 commented Feb 17, 2021

Actually as @waytrue17 pointed out in https://discuss.mxnet.apache.org/t/exporting-model-to-onnx-or-alternative-way-to-run-on-android/6862 this might have already been fixed. Would you try #19876 to update to the latest onnx support?

@LewsTherin511
Copy link

Actually as @waytrue17 pointed out in https://discuss.mxnet.apache.org/t/exporting-model-to-onnx-or-alternative-way-to-run-on-android/6862 this might have already been fixed. Would you try #19876 to update to the latest onnx support?

Hi, thank you very much for your answer!

I'm currenlty using mxnet-mkl (1.6.0), gluoncv (0.8.0) and onnx (1.8.1).
Actually, I also tried updating the installation, but I see that the only effect was to update to gluoncv (0.9.4.post1).

Anyway, I asked on the forum already, but I have an incredibly lame question. I generally always installed mxnet/gluoncv with pip. In order to try your suggestion, I’m assuming I should follow the instructions for an installation from source and switch to the branch you indicated?
So something like:

or am I getting this completely wrong?
Thanks!

@Zha0q1
Copy link
Contributor

Zha0q1 commented Feb 26, 2021

No no need to build from source. I know that sucks :)

You can just download my python script and run it anywhere and it should work. What the script does is basically 1) detect you current mxnet installation directory 2) pull the latest changes from mxnet repo 3) copy over and overwrite the onnx module to you current mxnet version. You shouldn't need to do anything besides running the script.

I think it's fine to keep you current mxnet and gluon versions.

Please let us know if you have any questions and feel free to @me.

@Zha0q1
Copy link
Contributor

Zha0q1 commented Feb 26, 2021

@LewsTherin511

@LewsTherin511
Copy link

LewsTherin511 commented Jul 27, 2021

Hi! Thanks again for your assistance last time, it worked perfectly!

However, I noticed something weird.
When I first used your script, I was working on another computer, and everything went ok. Yesterday, I tried exporting to ONNX on another machine, and I got the usual error:
ValueError: invalid literal for int() with base 10: 'None'
The machine I'm having problem with has
gluoncv (0.10.4.post0)
mxnet-mkl (1.6.0)
onnx (1.9.0)

So, I tried the update script, but running it seems to somehow break the mxnet installation. After the update, whenever I try importing mxnet, I get the error:
File "<stdin>", line 1, in <module> File "/home/lews/anaconda3/envs/gluon/lib/python3.8/site-packages/mxnet/__init__.py", line 31, in <module> from . import contrib File "/home/lews/anaconda3/envs/gluon/lib/python3.8/site-packages/mxnet/contrib/__init__.py", line 31, in <module> from . import onnx File "/home/lews/anaconda3/envs/gluon/lib/python3.8/site-packages/mxnet/contrib/onnx/__init__.py", line 22, in <module> from ...onnx import export_model as export_model_ ModuleNotFoundError: No module named 'mxnet.onnx'
and when importing GluonCV I got the error:
File "/home/lews/anaconda3/envs/gluon/lib/python3.8/site-packages/gluoncv/__init__.py", line 33, in <module> raise ImportError('Unable to import modules due to missing mxnet&torch. ' ImportError: Unable to import modules due to missing mxnet&torch. You should install at least one deep learning framework.
Clearly, the exporting script still doesn't work, nor does anything else MXNet related.

On the same machine, I tried creating a new virtual environment with:
mxnet (1.8.0.post0)
gluoncv (0.10.4.post0)

as before, everything works ok (generically using MXNet/GluonCV models), and the ONNX export doesn't. The error this time is different:
AttributeError: No conversion function registered for op type _greater_scalar yet.
I tried the update script again, and the problems are the same:
*) when importing mxnet
ModuleNotFoundError: No module named 'mxnet.onnx'
*) and, when importing gluoncv:
ImportError: Unable to import modules due to missing mxnet&torch. You should install at least one deep learning framework.

I still have everything working on the old machine, but I thought it might be useful the problem. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.