Adding OSS rocprof trace decoder#3723
Conversation
cedd462 to
31e1705
Compare
31e1705 to
0168370
Compare
stellaraccident
left a comment
There was a problem hiding this comment.
This LGTM, but I'll leave final review to Scott since he's got more of this stuff in flight/current.
ScottTodd
left a comment
There was a problem hiding this comment.
Nice, thorough replacement that builds and passes tests as expected.
I spot checked the CI run https://github.com/ROCm/TheRock/actions/runs/23013880271?pr=3723 's configure logs: https://therock-ci-artifacts.s3.amazonaws.com/23013880271-linux/logs/gfx94X-dcgpu/rocprof-trace-decoder_configure.log to see -S/__w/TheRock/TheRock/rocm-systems/projects/rocprof-trace-decoder (in rocm-systems), compared to the baseline of -S/__w/TheRock/TheRock/profiler/rocprof-trace-decoder (in the old submodule)
| # Profiler Trace Decoder Binary (not an artifact, just a config option) | ||
| if(THEROCK_FLAG_INCLUDE_PROFILER) | ||
| therock_add_feature(ROCPROF_TRACE_DECODER_BINARY | ||
| GROUP PROFILER | ||
| DESCRIPTION "Enables the closed-source rocprof trace decoder binary" | ||
| ) | ||
| endif() |
There was a problem hiding this comment.
FYI there are references to this -DTHEROCK_ENABLE_ROCPROF_TRACE_DECODER_BINARY option over in rocm-libraries at https://github.com/ROCm/rocm-libraries/blob/develop/projects/miopen/Dockerfile . Might want to update those the next time the TheRock hash is updated in rocm-libraries, or earlier (that dockerfile defaults to the default branch, so it will get this change immediately)
(I missed this myself, my code review agent spotted it, see the full review at https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3723.md)
There was a problem hiding this comment.
Created an issue to track this: ROCm/rocm-libraries#5431
|
Oops, looks like this broke multi-arch CI: https://github.com/ROCm/TheRock/actions/runs/23055657477/job/66969204535#step:12:189 |
|
Hi, trace-decoder doesn't contain any GPU code/targets. |
|
Still need to set it, see TheRock/iree-libs/CMakeLists.txt Lines 9 to 13 in e2c94d4 |
|
Attempt at fix: #3966 |
## Motivation Fixes breakage by PR: #3723 The build infra requires DIST_AMDGPU_TARGETS even if the list isn't used by the underlying project. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## Motivation Following up on these PRs: * #3723 * #3966 the build was still broken: ``` [199/260] Configure sub-project composable_kernel (in background) [200/260] Building sub-project composable_kernel (in background) FAILED: ml-libs/composable_kernel/stamp/build.stamp /__w/TheRock/TheRock/build/ml-libs/composable_kernel/stamp/build.stamp cd /__w/TheRock/TheRock/build/ml-libs/composable_kernel/build && /opt/python/cp312-cp312/bin/python3.12 /__w/TheRock/TheRock/build_tools/teatime.py --log-timestamps --label composable_kernel --interactive /__w/TheRock/TheRock/build/logs/composable_kernel_build.log -- /usr/local/therock-tools/bin/cmake -E env --unset=ROCM_PATH --unset=ROCM_DIR --unset=HIP_PATH --unset=HIP_DIR -- /usr/local/therock-tools/bin/cmake --build /__w/TheRock/TheRock/build/ml-libs/composable_kernel/build && /usr/local/therock-tools/bin/cmake -E touch /__w/TheRock/TheRock/build/ml-libs/composable_kernel/stamp/build.stamp [composable_kernel] [1/1686] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o [composable_kernel] FAILED: library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o [composable_kernel] ccache /__w/TheRock/TheRock/build/core/clr/dist/lib/llvm/bin/clang++ -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_TILE_USE_WMMA=1 -DCK_TIME_KERNEL=1 -DCK_USE_OCP_FP8 -DCK_USE_WMMA -DCK_USE_WMMA_FP8 -DCK_USE_XDL -DDPP_KERNELS -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/library/include -I/__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/include -I/__w/TheRock/TheRock/build/ml-libs/composable_kernel/build/include -I/__w/TheRock/TheRock/build/profiler/rocprofiler-sdk/stage/include -I/__w/TheRock/TheRock/build/profiler/rocprof-trace-decoder/stage/include -I/__w/TheRock/TheRock/build/profiler/roctracer/stage/include -I/__w/TheRock/TheRock/build/base/half/stage/include -I/__w/TheRock/TheRock/build/third-party/sysdeps/linux/libdrm/build/stage/lib/rocm_sysdeps/include -I/__w/TheRock/TheRock/build/third-party/sysdeps/linux/numactl/build/stage/lib/rocm_sysdeps/include -isystem /__w/TheRock/TheRock/build/core/clr/dist/include -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class --hip-path=/__w/TheRock/TheRock/build/core/clr/dist --hip-device-lib-path=/__w/TheRock/TheRock/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=c++20 -fPIC -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-error=deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-c++20-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -Wno-unique-object-duplication -fbracket-depth=1024 -Wno-nrvo -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -Werror -Weverything -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx1200 --offload-arch=gfx1201 --offload-arch=gfx1200 --offload-arch=gfx1201 -MD -MT library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o -c /__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/library/src/tensor_operation_instance/gpu/conv1d_bwd_data/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp [composable_kernel] clang++: error: no such include directory: '/__w/TheRock/TheRock/build/profiler/rocprof-trace-decoder/stage/include' [-Werror,-Wmissing-include-dirs] ``` ## Technical Details `INTERFACE_INCLUDE_DIRS`/`INTERFACE_LINK_DIRS` leaked transitively through rocprofiler-sdk to consumers in other stages (e.g. composable_kernel, MIOpen), causing `-Wmissing-include-dirs` errors The trace decoder appears to be a runtime-only dependency (loaded by `rocprofv3 --att`), so it needs no compile-time interface dirs. This decouples it from the rocprofiler-sdk subproject dependency chain and instead lists it directly in the rocprofiler-sdk artifact `SUBPROJECT_DEPS` for packaging. ## Test Plan * Multi-arch * Passed multi-arch CI builds: https://github.com/ROCm/TheRock/actions/runs/23157787783 * This might later fail some tests (including after ROCm itself, in pytorch), I haven't fully tested that * Non-multi-arch * CI on this PR itself (build + test) ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Marius Brehler <marius.brehler@amd.com>
Motivation
This PR switches the closed source rocprof trace decoder to the OSS version.
Technical Details
Requirements: rocm-systems commit c8f55d9
Test Plan
Test Result
Submission Checklist