-
Notifications
You must be signed in to change notification settings - Fork 151
Add support for tracing profilers like Nvidia NSight System and Intel VTune #2908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,6 @@ | ||
| using Trixi | ||
| using CUDA | ||
| using NVTX # Load to get tracing support for Trixi | ||
| using TimerOutputs | ||
| using JSON | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -288,3 +288,44 @@ requires. It can thus be seen as a proxy for "energy used" and, as an extension, | |||||||
| timing result, you need to set the analysis interval such that the | ||||||||
| `AnalysisCallback` is invoked at least once during the course of the simulation and | ||||||||
| discard the first PID value. | ||||||||
|
|
||||||||
| ## Tracing support for profilers | ||||||||
|
|
||||||||
| Trixi supports tracing profiler integration through [ittapi](https://github.com/intel/ittapi) for Intel VTune and [NVTX](https://github.com/NVIDIA/NVTX) for [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems). | ||||||||
|
|
||||||||
| !!! note "Extensions" | ||||||||
| Tracing support is implemented through extensions and requires trigger packages to be loaded. | ||||||||
|
|
||||||||
| Tracing support is only available for regions that are instrumented with `@trixi_timeit_ext`. | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
If it has a docstring |
||||||||
|
|
||||||||
| ### Using Intel VTune | ||||||||
|
|
||||||||
| We can use Intel VTune to profile CPU code. For more information see the [Julia documentation](https://docs.julialang.org/en/v1/manual/profile/#External-Profiling) and the [IntelITT.jl](https://github.com/JuliaPerf/IntelITT.jl) package. | ||||||||
|
|
||||||||
| !!! note "Trigger package" | ||||||||
| ```julia | ||||||||
| using IntelITT | ||||||||
| ``` | ||||||||
|
|
||||||||
| To get the most out of Intel VTune we recommend passing the environment flag `ENABLE_JITPROFILING=1` to Julia, which will allow you to symbolize JIT compiled call frames. | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
For clarity? |
||||||||
|
|
||||||||
| !!! note "Usage of `juliaup`" | ||||||||
| Sometime `juliaup` can make it harder for a profiler to attach to the right process. You can use `Base.julia_cmd()` in the REPL to obtain the path to the actual Julia binary you will be running. | ||||||||
|
|
||||||||
|
|
||||||||
| ### NVIDIA Nsight Systems | ||||||||
|
|
||||||||
| We can use NVIDIA Nsight Systems to trace GPU. | ||||||||
|
|
||||||||
| We recommend reading the CUDA.jl documentation on using [Nsight Systems](https://cuda.juliagpu.org/stable/development/profiling/#NVIDIA-Nsight-Systems) | ||||||||
|
|
||||||||
| !!! note "Trigger package" | ||||||||
| ```julia | ||||||||
| using CUDA | ||||||||
| using NVTX | ||||||||
| ``` | ||||||||
|
|
||||||||
| You can also just use `CUDA.@profile` (see [Integrated Profiler](https://cuda.juliagpu.org/stable/development/profiling/#Integrated-profiler)) to obtain profiler results that include the NVTX ranges. | ||||||||
|
|
||||||||
| #### Known limitation | ||||||||
| Nsight Systems can also be used for CPU and in particular MPI codes. The Trixi extension will only be enabled when GPU backend is being used. | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, is there a downside to enabling it also for the CPU backend? |
||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to add at least a rudimentary comment on the purpose of this extension. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| module TrixiIntelITTExt | ||
|
|
||
| using Trixi: CPU | ||
| import Trixi: trixi_range_active, trixi_range_start, trixi_range_end | ||
|
|
||
| import IntelITT | ||
|
|
||
| const domain = Ref{IntelITT.Domain}() | ||
| function __init__() | ||
| domain[] = IntelITT.Domain("Trixi") | ||
| end | ||
|
|
||
| function trixi_range_active(::Union{Nothing, CPU}) | ||
| return IntelITT.isactive() | ||
| end | ||
|
|
||
| function trixi_range_start(::Union{Nothing, CPU}, label) | ||
| task = IntelITT.Task(domain[], label) | ||
| IntelITT.start(task) | ||
| return task | ||
| end | ||
|
|
||
| function trixi_range_end(::Union{Nothing, CPU}, id) | ||
| IntelITT.stop(id) | ||
| return nothing | ||
| end | ||
|
|
||
| end # module |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to add at least a rudimentary comment on the purpose of this extension. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| module TrixiNVTXExt | ||
|
|
||
| using NVTX | ||
| using CUDA: CUDABackend | ||
| import Trixi: trixi_range_active, trixi_range_start, trixi_range_end | ||
|
|
||
| # One can also use Nsight Systems and thus NVTX for CPU code | ||
|
|
||
| const domain = NVTX.Domain("Trixi") | ||
| const color = 0xff40e0d0 # turquoise | ||
|
|
||
| function trixi_range_active(::CUDABackend) | ||
| return NVTX.isactive() | ||
| end | ||
|
|
||
| function trixi_range_start(::CUDABackend, label) | ||
| return NVTX.range_start(NVTX.init!(domain); message = label, color = color) | ||
| end | ||
|
|
||
| function trixi_range_end(::CUDABackend, id) | ||
| NVTX.range_end(id) | ||
| return nothing | ||
| end | ||
|
|
||
| end # module |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -82,6 +82,35 @@ end | |
| return ncalls_first | ||
| end | ||
|
|
||
| # TODO: move to KernelAbstractions | ||
| """ | ||
| trixi_range_active(backend) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd call these three functions |
||
|
|
||
| Returns `true` if the given `backend` supports range annotations and a profiler is active, `false` otherwise. | ||
| """ | ||
| function trixi_range_active(backend::Any) | ||
| return false | ||
| end | ||
|
|
||
| """ | ||
| trixi_range_start(backend, label) | ||
|
|
||
| Starts a range annotation for the given `backend` with the specified `label`. | ||
| Returns a handle to the started range, which should be passed to `trixi_range_end` to end the range annotation. | ||
| """ | ||
| function trixi_range_start(backend::Any, label) | ||
| return nothing | ||
| end | ||
|
|
||
| """ | ||
| trixi_range_end(backend, id) | ||
|
|
||
| Ends a range annotation for the given `backend` with the specified `id`. | ||
| """ | ||
| function trixi_range_end(backend::Any, id) | ||
| return nothing | ||
| end | ||
|
|
||
| """ | ||
| @trixi_timeit_ext backend timer() "some label" expression | ||
|
|
||
|
|
@@ -93,10 +122,17 @@ See also [`@trixi_timeit`](@ref). | |
| """ | ||
| macro trixi_timeit_ext(backend, timer_output, label, expr) | ||
| expr = quote | ||
| local active = $trixi_range_active($(esc(backend))) | ||
| if active | ||
| id = $trixi_range_start($(esc(backend)), $(esc(label))) | ||
| end | ||
| local val = $(esc(expr)) | ||
| if $(esc(backend)) !== nothing && $(TrixiBase).timeit_debug_enabled() | ||
| $(KernelAbstractions.synchronize)($(esc(backend))) | ||
| end | ||
| if active | ||
| $trixi_range_end($(esc(backend)), id) | ||
| end | ||
| val | ||
| end | ||
| return :(@trixi_timeit($(esc(timer_output)), $(esc(label)), $(expr))) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always 🙃