Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Python custom layer is extremely slow #3139

Closed
Maofei opened this issue Aug 25, 2016 · 6 comments
Closed

Python custom layer is extremely slow #3139

Maofei opened this issue Aug 25, 2016 · 6 comments

Comments

@Maofei
Copy link

Maofei commented Aug 25, 2016

It's from mx.operator.CustomOp as implemented in rcnn example. @precedenceguo
There is about one and a half sec gap between finish init of the python operator class and its beginning to do forward operation.
What it's doing during this time?
Are there some suggestions to squeeze the time?

@ijkguo
Copy link
Contributor

ijkguo commented Aug 26, 2016

My guess is that the init process and forward process does not happen sequentially. The operator might be created only once. Let us consult @piiswrong

@Maofei
Copy link
Author

Maofei commented Aug 26, 2016

Actually the almost 2s gap happens for every image passed in during test.
According to my log printed, I guess the init process fired every time the executor do a forward,
anyway the init process is not that much time consuming neither for the forward process.
I printed the time gap between init finished and forward started, and it is this gap that dominant.
Hope for some mechanism explanation and speed up suggestion. @piiswrong

here is one of my log notes:
init start --> init finish time :0.000526905059814s(It's different for every image during test)
init finish --> do forward : 1.71541309357s
forward start --> forward end : 0.209252119064s
detector.im_detect used sec 1.93840193748

@piiswrong
Copy link
Contributor

My guess is the layers before customop is running

@Maofei
Copy link
Author

Maofei commented Aug 28, 2016

You mean the initialisation of the customop class is actually precede the whole forward process(conv, relu, customop etc) right? @piiswrong
That's make sense, while the time consumed is much larger than the caffe implementation(0.7s).
Could you please give me some advise about how to speed up this forward process?
I guess a C++ or a Cuda implementation of this layer would be helpful?
However the caffe implementation has such a python layer as well, so I'm doubting whether I missed something else important for such a performance.

@ijkguo
Copy link
Contributor

ijkguo commented Aug 28, 2016

Someone else reported that engine would save time compared to python layer in caffe. c.f. ijkguo/mx-rcnn#20

@phunterlau
Copy link
Contributor

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants