Post processing instance segmentation taking a lot of time #14

lwillems191 · 2024-05-27T06:22:12Z

Hello,

When using instance segmentation the post processing is taking quite a lot of time and I was wondering if there might be way to optimize it. I found which line is taking the most time, but have not found a way to optimize it.
Maybe somebody else has a good idea.

var value = Enumerable.Range(0, output.Channels).Sum(i => tensor1[0, i, y, x] * maskWeights[i]);

The text was updated successfully, but these errors were encountered:

aloksharma1 · 2024-05-31T11:26:06Z

can you try this? (untested but changing to a for loop would surely improve it)

float value = 0;
for (int i = 0; i < output.Channels; ++i)
{
    value += tensor1[0, i, y, x] * maskWeights[i];
}

NickSwardh · 2024-05-31T15:41:34Z

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

aloksharma1 · 2024-05-31T21:00:27Z

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

is this branch prod ready (on nuget)?

lwillems191 · 2024-06-03T13:16:07Z

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

Yeah this branch already gives a great improvement. Thanks for the work you put into it.

NickSwardh · 2024-06-03T20:41:58Z

Awesome! Thank you for letting me know :)

NickSwardh · 2024-06-03T20:48:20Z

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

is this branch prod ready (on nuget)?

No, not yet, It's a work in progress. I'm still turning the nuts and bolts to see if I can squeeze some more speed out of this thing ;)

louislewis2 · 2024-06-10T08:33:02Z

Hi @NickSwardh

First off, thanks for the great library you have created here!

Inspired by this issue and facing some performance issues myself, I forked your branch and initially added some benchmarks to ensure that code changes for perf can be validated. Once the benchmarks were in place, I was able to spot some quick wins that at least in my testing has dramatically improved the overall performance. I also added a few other useful benchmarks to start understanding where time is spent and memory is allocated. The reduced GC pressure has increased my overall throughput in my application due to there now being less GC induced pauses.

The benchmarks that require it, also run both Gpu and Cpu variations, so that one can spot improvements or degradations over both at the same time.

I have created a PR if you are interested, I apologize upfront for the size of it.
Some refactoring seemed fitting to make provision for sharing of resources like the assets etc..

#16

lwillems191 · 2024-08-13T12:36:26Z

I did some more testing. The new improvements make the code already a lot faster, but from my testing it seems it might be better to not use Parellel.For loops. They seem to be a lot less consistent then a normal for loop. Also the speed improvement does not seem that much. Hopefully you will take this into consideration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post processing instance segmentation taking a lot of time #14

Post processing instance segmentation taking a lot of time #14

lwillems191 commented May 27, 2024

aloksharma1 commented May 31, 2024 •

edited

Loading

NickSwardh commented May 31, 2024

aloksharma1 commented May 31, 2024

lwillems191 commented Jun 3, 2024

NickSwardh commented Jun 3, 2024

NickSwardh commented Jun 3, 2024

louislewis2 commented Jun 10, 2024

lwillems191 commented Aug 13, 2024

Post processing instance segmentation taking a lot of time #14

Post processing instance segmentation taking a lot of time #14

Comments

lwillems191 commented May 27, 2024

aloksharma1 commented May 31, 2024 • edited Loading

NickSwardh commented May 31, 2024

aloksharma1 commented May 31, 2024

lwillems191 commented Jun 3, 2024

NickSwardh commented Jun 3, 2024

NickSwardh commented Jun 3, 2024

louislewis2 commented Jun 10, 2024

lwillems191 commented Aug 13, 2024

aloksharma1 commented May 31, 2024 •

edited

Loading