Add enable_profiling API in RunOptions using Thread-Local Storage#26856
Add enable_profiling API in RunOptions using Thread-Local Storage#26856xiaofeihan1 wants to merge 3 commits intomainfrom
Conversation
e8c9879 to
559a8dd
Compare
09efa64 to
d75ee74
Compare
| if (file_path.empty()) { | ||
| run_level_state_.events.clear(); | ||
| return std::string(); | ||
| } |
There was a problem hiding this comment.
Move L201-L204 before collecting EP profiler events?
| @@ -3138,6 +3143,14 @@ Status InferenceSession::Run(const RunOptions& run_options, | |||
| if (session_profiler_.IsEnabled()) { | |||
There was a problem hiding this comment.
Use if (session_profiler_.IsEnabled() || run_options.enable_profiling)?
|
|
||
| // Run-level profiling support | ||
| void StartRunProfiling() override; | ||
| void EndRunProfiling(TimePoint start_time, onnxruntime::profiling::Events& events) override; |
There was a problem hiding this comment.
Can we reuse StartProfiling and EndProfiling no matter it's session level or run level? If it's session-level, just pass session-level's events to EndProfiling. Otherwise, pass run-level events to EndProfiling. Can it satisfy the requirement?
| return true; | ||
| } | ||
|
|
||
| virtual void StartRunProfiling() override {} |
| virtual void StartRunProfiling() {} | ||
| virtual void EndRunProfiling(TimePoint start_time, Events& events) { | ||
| ORT_UNUSED_PARAMETER(start_time); | ||
| ORT_UNUSED_PARAMETER(events); |
There was a problem hiding this comment.
Suggest simply commenting out param names
| out << "\"ts\" :" << rec.ts << ","; | ||
| out << R"("ph" : "X",)"; | ||
| out << R"("name" :")" << rec.name << "\","; | ||
| out << "\"args\" : {"; |
There was a problem hiding this comment.
Could this be a separate operator<<(std::ostream&) for the record or function?
|
|
||
|
|
||
|
|
||
| static constexpr int kMaxSymbolSize = 1024; |
There was a problem hiding this comment.
This file does not seem to have changes.
|
Here is the big question. Can we have things pertaining to RunOptions profiling stored within RunOptions and not TLS? |
Thanks for the suggestion! I’m re-implementing this using your proposed solution. I’ll let you know once it’s done. |
|
Close this PR because we will use another solution. See #26846 |
Description
Support profiling for specific Run.
Similar to session-level profiling, developers can specify
enable_profilingandprofile_file_prefixin RunOptions. After the run completes, the resulting JSON file will be saved with the given profile_file_prefix + timestamp.When profiling is enabled via RunOptions, it should ideally collect two types of events:
Used to calculate the total execution time of each operator.
Used to measure GPU kernel execution time.
Unlike session-level, we need to ensure the collecting events is correct for multiple thread scenario.
For 1, this can be supported easily(sequential_executor.cc). We use a thread-local storage (TLS) variable, RunLevelState (defined in profiler.h), to maintain run-level profiling state for each thread.
For 2, each Execution Provider (EP) has its own profiler implementation, and each EP must ensure correct behavior under run-level profiling. This PR ensures that the WebGPU profiler works correctly with run-level profiling.
sess1.Run({ enable_profiling: true })t2:
sess1.Run({ enable_profiling: false })t3:
sess1.Run({ enable_profiling: true })t1and one fort3.sess1 = OrtSession({ enable_profiling: true })sess1.Run({ enable_profiling: true })Motivation and Context
Previously, profiling only for session level. Sometimes developer want to profile for specfic run . so the PR comes.