-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow inference in tensorflow #6
Comments
That slowdown seems quite drastic, do you have e.g. many categories or images to evaluate in one step? I suspect that dedicated CUDA kernel would speed up the implementation a lot (better than the large succession of the masking/selecting/sort operations). However I don't plan to tackle this in the near future - contributions in this direction are welcome. |
I just use it on cityscapes dataset. I use distributed computing. model is deeplab v3+ |
I did some profiling. It seems in tensorflow the In Pytorch, as expected, the I will investigate a bit more, it might mandate an issue report for tensorflow. |
This python notebook summarizes the problem: tensorflow/profile_ops.ipynb |
After looking more into it, it seems the easiest way is to create a custom tensorflow op using cub exclusive sum instead of the native I will not implement this for now as I'm mainly using pytorch - I might do it one day but in the meantime I'll tag this as |
The speed of cumsum has been improved significantly; I'm going to close this. Feel free to re-open if you feel it still isn't fast enough. |
@ekelsen Which version of tensor flow has these improvements? |
Currently just HEAD: tensorflow/tensorflow@73e3215 |
Thanks for the pointer @ekelsen. Closing this issue |
@ekelsen , hello, I'm using Keras(backend: tensorflow 1.12) and cuda9.0, but the train speed is still slow with this loss function. Can you give me advice? My GPU is GTX 1080Ti |
@stillwaterman I expect the build of Tensorflow you are using, was made before the changes to cumsum were implemented. Building Tensorflow from source, might be a reasonable option to expedite training. |
@jianlong-yuan hi~I want to use Lovász-Softmax loss in deeplab v3+ but failed. Could you give me some reference or demos? Thanks. |
@Z-Ianthe How to solve the abouve problems, i put here https://github.com/jianlong-yuan/LovaszSoftmax_tf/tree/master |
I don't have time to investivate into tensorflow issues for now but I am at least reopening the issue. |
before i use your loss function, 2.5sec/step
after i use your loss function, 32.0sec/step
i use tensorflow 1.6.0
The text was updated successfully, but these errors were encountered: