optimize unordered_map in profiler #166

zwxxx · 2018-12-13T04:06:21Z

Arguments of Profiler::EndTimeAndRecordEvent are constructed even profiler is disabled. The unordered_map construction is very expensive in terms of both CPU and latency (dynamic allocation then memory contention). It costs 6.1% CPU in test. After this optimization, end-2-end avg latency reduced from 686us to 608us (11.4%), 1min CPU samples reduced from 19K to 13K (31,6%). operator new inc% reduced from 8.9% to 4.6%. (Absolute values subject to variance test by test)
The test is performed under 1000QPS, 10 threads on production V15 machine.

msftclas · 2018-12-13T04:06:32Z

All CLA requirements met.

RyanUnderhill · 2018-12-13T21:32:58Z

Looks good, thanks!

pranavsharma

Why even bother calling the EndTime... method if the profiler is disabled? This way you can tackle the problem at the source.

zwxxx · 2018-12-13T21:55:00Z

It's checked inside EndTime. If it's disabled it returned directly. We can move the check outside EndTime but it looks less nicer.

tracysh · 2018-12-13T22:04:48Z

As it is, EndTime still constructs a throwaway string object (node_name + "_kernel_time") that is wasteful. We also hit this in a scenario in WinML where these calls were costing us. It would be nice to not have any heap allocations due to EndTime.

zwxxx · 2018-12-13T22:10:17Z

@tracysh exactly. std::string actually has small length optimization so if the length is short likely no allocation happens, although std::string construction is still a waste. At least I didn't see allocations from std::string from my CPU profiles. std::unordered_map is an allocation beast. Actually I'm not even sure why EventRecord (EndTime calls it internally) accepts unordered_map, it likely can be std::initializer_list.

tracysh · 2018-12-13T22:23:24Z

I just stepped through my winml binary and I see these string calls hit the heap. I'm all for changing the map code too as it reduces the size of the executor function, but at some point, I also want to see these strings not constructed.

zwxxx · 2018-12-13T22:36:04Z

@tracysh I'm actually hunting all the dynamic allocations. operator new takes ~9% in my profile and end-2-end latency is very bad. Here it's just one of many. There's more severe allocations in other place: make_unique, and TensorShape itself is a vector etc. I'm trying to see what I can do with these. Will loop you in.

pranavsharma · 2018-12-13T23:38:49Z

It's checked inside EndTime. If it's disabled it returned directly. We can move the check outside EndTime but it looks less nicer.

As you can see, it's too late to check inside EndTime. The API might require some rework for sure. I think for now we should avoid invoking it in the first place if the profiler is disabled.

zwxxx · 2018-12-13T23:54:38Z

@pranavsharma Actually I thought about reworking API. There're two options: 1) Expose IsEnabled() and check if profiler is enabled before EndTime(). This is less nicer as you have to do this before every profiler API call. 2) Create a macro that wraps IsEnabled() an profiler API. I'm not a super fan of abusing macro. Which option you prefer, or you have better idea?

pranavsharma · 2018-12-14T00:36:10Z

@pranavsharma Actually I thought about reworking API. There're two options: 1) Expose IsEnabled() and check if profiler is enabled before EndTime(). This is less nicer as you have to do this before every profiler API call. 2) Create a macro that wraps IsEnabled() an profiler API. I'm not a super fan of abusing macro. Which option you prefer, or you have better idea?

The macro option is fine. You can do it in a separate PR since this one is already merged.

optimize unordered_map

a6b02d0

zwxxx requested a review from a team as a code owner December 13, 2018 04:06

RyanUnderhill approved these changes Dec 13, 2018

View reviewed changes

pranavsharma reviewed Dec 13, 2018

View reviewed changes

yufenglee approved these changes Dec 14, 2018

View reviewed changes

yufenglee merged commit 2ffaa8a into microsoft:master Dec 14, 2018

jywu-msft mentioned this pull request Dec 22, 2018

Avoid to run profiling code in critical stack completely if there is no need #245

Merged

krushith720 mentioned this pull request Jun 8, 2022

[Snyk] Security upgrade prebuild-install from 6.1.2 to 7.1.1 krushith720/onnxruntime#54

Open

ekmixon mentioned this pull request Jun 8, 2022

[Snyk] Security upgrade prebuild-install from 6.1.2 to 7.1.1 ekmixon/onnxruntime#87

Open

eliasbuchwald mentioned this pull request Jun 8, 2022

[Snyk] Security upgrade prebuild-install from 6.1.2 to 7.1.1 eliasbuchwald/onnxruntime#62

Open

snyk-bot mentioned this pull request Jun 8, 2022

[Snyk] Security upgrade prebuild-install from 6.0.0 to 7.1.1 RedisAI/onnxruntime#32

Open

eliasbuchwald mentioned this pull request May 21, 2024

[Snyk] Security upgrade prebuild-install from 6.1.2 to 7.1.1 eliasbuchwald/onnxruntime#163

Open

DvirDukhan mentioned this pull request May 21, 2024

[Snyk] Security upgrade prebuild-install from 6.0.0 to 7.1.1 RedisAI/onnxruntime#77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize unordered_map in profiler #166

optimize unordered_map in profiler #166

zwxxx commented Dec 13, 2018

msftclas commented Dec 13, 2018 •

edited

Loading

RyanUnderhill commented Dec 13, 2018

pranavsharma left a comment

zwxxx commented Dec 13, 2018

tracysh commented Dec 13, 2018

zwxxx commented Dec 13, 2018 •

edited

Loading

tracysh commented Dec 13, 2018

zwxxx commented Dec 13, 2018

pranavsharma commented Dec 13, 2018

zwxxx commented Dec 13, 2018

pranavsharma commented Dec 14, 2018

optimize unordered_map in profiler #166

optimize unordered_map in profiler #166

Conversation

zwxxx commented Dec 13, 2018

msftclas commented Dec 13, 2018 • edited Loading

RyanUnderhill commented Dec 13, 2018

pranavsharma left a comment

Choose a reason for hiding this comment

zwxxx commented Dec 13, 2018

tracysh commented Dec 13, 2018

zwxxx commented Dec 13, 2018 • edited Loading

tracysh commented Dec 13, 2018

zwxxx commented Dec 13, 2018

pranavsharma commented Dec 13, 2018

zwxxx commented Dec 13, 2018

pranavsharma commented Dec 14, 2018

msftclas commented Dec 13, 2018 •

edited

Loading

zwxxx commented Dec 13, 2018 •

edited

Loading