-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post processing instance segmentation taking a lot of time #14
Comments
can you try this? (untested but changing to a for loop would surely improve it)
|
Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there. |
is this branch prod ready (on nuget)? |
Yeah this branch already gives a great improvement. Thanks for the work you put into it. |
Awesome! Thank you for letting me know :) |
No, not yet, It's a work in progress. I'm still turning the nuts and bolts to see if I can squeeze some more speed out of this thing ;) |
Hi @NickSwardh First off, thanks for the great library you have created here! Inspired by this issue and facing some performance issues myself, I forked your branch and initially added some benchmarks to ensure that code changes for perf can be validated. Once the benchmarks were in place, I was able to spot some quick wins that at least in my testing has dramatically improved the overall performance. I also added a few other useful benchmarks to start understanding where time is spent and memory is allocated. The reduced GC pressure has increased my overall throughput in my application due to there now being less GC induced pauses. The benchmarks that require it, also run both Gpu and Cpu variations, so that one can spot improvements or degradations over both at the same time. I have created a PR if you are interested, I apologize upfront for the size of it. |
I did some more testing. The new improvements make the code already a lot faster, but from my testing it seems it might be better to not use Parellel.For loops. They seem to be a lot less consistent then a normal for loop. Also the speed improvement does not seem that much. Hopefully you will take this into consideration. |
Hello,
When using instance segmentation the post processing is taking quite a lot of time and I was wondering if there might be way to optimize it. I found which line is taking the most time, but have not found a way to optimize it.
Maybe somebody else has a good idea.
var value = Enumerable.Range(0, output.Channels).Sum(i => tensor1[0, i, y, x] * maskWeights[i]);
The text was updated successfully, but these errors were encountered: