Skip to content

Adding OSS rocprof trace decoder#3723

Merged
ApoKalipse-V merged 2 commits into
mainfrom
users/gbaraldi/tracedecoder
Mar 13, 2026
Merged

Adding OSS rocprof trace decoder#3723
ApoKalipse-V merged 2 commits into
mainfrom
users/gbaraldi/tracedecoder

Conversation

@ApoKalipse-V
Copy link
Copy Markdown
Contributor

@ApoKalipse-V ApoKalipse-V commented Mar 3, 2026

Motivation

This PR switches the closed source rocprof trace decoder to the OSS version.

Technical Details

Requirements: rocm-systems commit c8f55d9

Test Plan

Test Result

Submission Checklist

Copy link
Copy Markdown
Collaborator

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I'll leave final review to Scott since he's got more of this stuff in flight/current.

Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thorough replacement that builds and passes tests as expected.

I spot checked the CI run https://github.com/ROCm/TheRock/actions/runs/23013880271?pr=3723 's configure logs: https://therock-ci-artifacts.s3.amazonaws.com/23013880271-linux/logs/gfx94X-dcgpu/rocprof-trace-decoder_configure.log to see -S/__w/TheRock/TheRock/rocm-systems/projects/rocprof-trace-decoder (in rocm-systems), compared to the baseline of -S/__w/TheRock/TheRock/profiler/rocprof-trace-decoder (in the old submodule)

Comment thread CMakeLists.txt
Comment on lines -267 to -273
# Profiler Trace Decoder Binary (not an artifact, just a config option)
if(THEROCK_FLAG_INCLUDE_PROFILER)
therock_add_feature(ROCPROF_TRACE_DECODER_BINARY
GROUP PROFILER
DESCRIPTION "Enables the closed-source rocprof trace decoder binary"
)
endif()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there are references to this -DTHEROCK_ENABLE_ROCPROF_TRACE_DECODER_BINARY option over in rocm-libraries at https://github.com/ROCm/rocm-libraries/blob/develop/projects/miopen/Dockerfile . Might want to update those the next time the TheRock hash is updated in rocm-libraries, or earlier (that dockerfile defaults to the default branch, so it will get this change immediately)

(I missed this myself, my code review agent spotted it, see the full review at https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/pr_TheRock_3723.md)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track this: ROCm/rocm-libraries#5431

@ApoKalipse-V ApoKalipse-V merged commit 24a1af1 into main Mar 13, 2026
164 of 175 checks passed
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Mar 13, 2026
@ApoKalipse-V ApoKalipse-V deleted the users/gbaraldi/tracedecoder branch March 13, 2026 14:33
@ScottTodd
Copy link
Copy Markdown
Member

Oops, looks like this broke multi-arch CI: https://github.com/ROCm/TheRock/actions/runs/23055657477/job/66969204535#step:12:189

-- Including subproject rocrtst (from /__w/TheRock/TheRock/rocm-systems/projects/rocr-runtime/rocrtst/suites/test_common)
-- Including subproject rocprof-trace-decoder (from /__w/TheRock/TheRock/rocm-systems/projects/rocprof-trace-decoder)
CMake Error at cmake/therock_subproject.cmake:1410 (message):
  Subproject rocprof-trace-decoder requires AMDGPU targets but none were
  selected.  Set THEROCK_AMDGPU_FAMILIES or THEROCK_AMDGPU_TARGETS.
Call Stack (most recent call first):
  cmake/therock_subproject.cmake:726 (_therock_cmake_subproject_setup_toolchain)
  profiler/CMakeLists.txt:27 (therock_cmake_subproject_activate)

@ApoKalipse-V
Copy link
Copy Markdown
Contributor Author

Hi, trace-decoder doesn't contain any GPU code/targets.

@ScottTodd
Copy link
Copy Markdown
Member

Still need to set it, see

# IREE compiler has general support for many GPUs/backends and doesn't
# compile device code. The build infrastructure requires generic stages to
# pass through DIST_AMDGPU_TARGETS even if the list isn't used by the
# underlying project.
USE_DIST_AMDGPU_TARGETS
and this recent discussion: #3611 (comment)

@ApoKalipse-V
Copy link
Copy Markdown
Contributor Author

Attempt at fix: #3966

ApoKalipse-V added a commit that referenced this pull request Mar 13, 2026
## Motivation

Fixes breakage by PR: #3723 
The build infra requires DIST_AMDGPU_TARGETS even if the list isn't used
by the underlying project.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
marbre added a commit that referenced this pull request Mar 17, 2026
## Motivation

Following up on these PRs:
* #3723
* #3966

the build was still broken:
```
[199/260] Configure sub-project composable_kernel (in background)
[200/260] Building sub-project composable_kernel (in background)
FAILED: ml-libs/composable_kernel/stamp/build.stamp /__w/TheRock/TheRock/build/ml-libs/composable_kernel/stamp/build.stamp 
cd /__w/TheRock/TheRock/build/ml-libs/composable_kernel/build && /opt/python/cp312-cp312/bin/python3.12 /__w/TheRock/TheRock/build_tools/teatime.py --log-timestamps --label composable_kernel --interactive /__w/TheRock/TheRock/build/logs/composable_kernel_build.log -- /usr/local/therock-tools/bin/cmake -E env --unset=ROCM_PATH --unset=ROCM_DIR --unset=HIP_PATH --unset=HIP_DIR -- /usr/local/therock-tools/bin/cmake --build /__w/TheRock/TheRock/build/ml-libs/composable_kernel/build && /usr/local/therock-tools/bin/cmake -E touch /__w/TheRock/TheRock/build/ml-libs/composable_kernel/stamp/build.stamp
[composable_kernel] [1/1686] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o
[composable_kernel] FAILED: library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o 
[composable_kernel] ccache /__w/TheRock/TheRock/build/core/clr/dist/lib/llvm/bin/clang++ -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_TILE_USE_WMMA=1 -DCK_TIME_KERNEL=1 -DCK_USE_OCP_FP8 -DCK_USE_WMMA -DCK_USE_WMMA_FP8 -DCK_USE_XDL -DDPP_KERNELS -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/library/include -I/__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/include -I/__w/TheRock/TheRock/build/ml-libs/composable_kernel/build/include -I/__w/TheRock/TheRock/build/profiler/rocprofiler-sdk/stage/include -I/__w/TheRock/TheRock/build/profiler/rocprof-trace-decoder/stage/include -I/__w/TheRock/TheRock/build/profiler/roctracer/stage/include -I/__w/TheRock/TheRock/build/base/half/stage/include -I/__w/TheRock/TheRock/build/third-party/sysdeps/linux/libdrm/build/stage/lib/rocm_sysdeps/include -I/__w/TheRock/TheRock/build/third-party/sysdeps/linux/numactl/build/stage/lib/rocm_sysdeps/include -isystem /__w/TheRock/TheRock/build/core/clr/dist/include -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class --hip-path=/__w/TheRock/TheRock/build/core/clr/dist --hip-device-lib-path=/__w/TheRock/TheRock/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=c++20 -fPIC   -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-error=deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-c++20-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -Wno-unique-object-duplication -fbracket-depth=1024 -Wno-nrvo -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -Werror -Weverything -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx1200 --offload-arch=gfx1201 --offload-arch=gfx1200 --offload-arch=gfx1201 -MD -MT library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o -c /__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/library/src/tensor_operation_instance/gpu/conv1d_bwd_data/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp
[composable_kernel] clang++: error: no such include directory: '/__w/TheRock/TheRock/build/profiler/rocprof-trace-decoder/stage/include' [-Werror,-Wmissing-include-dirs]
```

## Technical Details

`INTERFACE_INCLUDE_DIRS`/`INTERFACE_LINK_DIRS` leaked transitively
through rocprofiler-sdk to consumers in other stages (e.g.
composable_kernel, MIOpen), causing `-Wmissing-include-dirs` errors

The trace decoder appears to be a runtime-only dependency (loaded by
`rocprofv3 --att`), so it needs no compile-time interface dirs. This
decouples it from the rocprofiler-sdk subproject dependency chain and
instead lists it directly in the rocprofiler-sdk artifact
`SUBPROJECT_DEPS` for packaging.

## Test Plan

* Multi-arch
* Passed multi-arch CI builds:
https://github.com/ROCm/TheRock/actions/runs/23157787783
* This might later fail some tests (including after ROCm itself, in
pytorch), I haven't fully tested that
* Non-multi-arch
  * CI on this PR itself (build + test)

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Marius Brehler <marius.brehler@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request platform: Linux

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants