Fix CI break for rocprof trace decoder#3966
Conversation
|
Did you trigger https://github.com/ROCm/TheRock/actions/workflows/multi_arch_ci.yml to test this? |
ScottTodd
left a comment
There was a problem hiding this comment.
Test run at https://github.com/ROCm/TheRock/actions/runs/23063324268/job/66996069227 got past the previously failing stage (https://github.com/ROCm/TheRock/actions/runs/23055657477/job/66969204535#step:12:189). Looks fine to merge this fix-forward (given that multi-arch CI is not yet running on PRs, if it was then I'd prefer a revert instead of a fix-forward)
|
Ah, this didn't fully fix the build: https://github.com/ROCm/TheRock/actions/runs/23067115978/job/67013062466#step:13:215 We may need to revert this and #3723 |
|
I'm working with @marbre to test some potential fix-forwards. If those don't fully resolve this we may revert instead. |
## Motivation Following up on these PRs: * #3723 * #3966 the build was still broken: ``` [199/260] Configure sub-project composable_kernel (in background) [200/260] Building sub-project composable_kernel (in background) FAILED: ml-libs/composable_kernel/stamp/build.stamp /__w/TheRock/TheRock/build/ml-libs/composable_kernel/stamp/build.stamp cd /__w/TheRock/TheRock/build/ml-libs/composable_kernel/build && /opt/python/cp312-cp312/bin/python3.12 /__w/TheRock/TheRock/build_tools/teatime.py --log-timestamps --label composable_kernel --interactive /__w/TheRock/TheRock/build/logs/composable_kernel_build.log -- /usr/local/therock-tools/bin/cmake -E env --unset=ROCM_PATH --unset=ROCM_DIR --unset=HIP_PATH --unset=HIP_DIR -- /usr/local/therock-tools/bin/cmake --build /__w/TheRock/TheRock/build/ml-libs/composable_kernel/build && /usr/local/therock-tools/bin/cmake -E touch /__w/TheRock/TheRock/build/ml-libs/composable_kernel/stamp/build.stamp [composable_kernel] [1/1686] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o [composable_kernel] FAILED: library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o [composable_kernel] ccache /__w/TheRock/TheRock/build/core/clr/dist/lib/llvm/bin/clang++ -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_TILE_USE_WMMA=1 -DCK_TIME_KERNEL=1 -DCK_USE_OCP_FP8 -DCK_USE_WMMA -DCK_USE_WMMA_FP8 -DCK_USE_XDL -DDPP_KERNELS -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/library/include -I/__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/include -I/__w/TheRock/TheRock/build/ml-libs/composable_kernel/build/include -I/__w/TheRock/TheRock/build/profiler/rocprofiler-sdk/stage/include -I/__w/TheRock/TheRock/build/profiler/rocprof-trace-decoder/stage/include -I/__w/TheRock/TheRock/build/profiler/roctracer/stage/include -I/__w/TheRock/TheRock/build/base/half/stage/include -I/__w/TheRock/TheRock/build/third-party/sysdeps/linux/libdrm/build/stage/lib/rocm_sysdeps/include -I/__w/TheRock/TheRock/build/third-party/sysdeps/linux/numactl/build/stage/lib/rocm_sysdeps/include -isystem /__w/TheRock/TheRock/build/core/clr/dist/include -Wno-documentation-unknown-command -Wno-documentation-pedantic -Wno-unused-command-line-argument -Wno-explicit-specialization-storage-class --hip-path=/__w/TheRock/TheRock/build/core/clr/dist --hip-device-lib-path=/__w/TheRock/TheRock/build/core/clr/dist/lib/llvm/amdgcn/bitcode -O3 -DNDEBUG -std=c++20 -fPIC -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-error=deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wno-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-buffer-usage -Wno-unused-lambda-capture -Wno-nvcc-compat -Wno-c++20-compat -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -Wno-unique-object-duplication -fbracket-depth=1024 -Wno-nrvo -fno-offload-uniform-block -mllvm --lsr-drop-solution=1 -mllvm -enable-post-misched=0 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -Werror -Weverything -fcolor-diagnostics --offload-compress -x hip --offload-arch=gfx1200 --offload-arch=gfx1201 --offload-arch=gfx1200 --offload-arch=gfx1201 -MD -MT library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o -c /__w/TheRock/TheRock/rocm-libraries/projects/composablekernel/library/src/tensor_operation_instance/gpu/conv1d_bwd_data/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp [composable_kernel] clang++: error: no such include directory: '/__w/TheRock/TheRock/build/profiler/rocprof-trace-decoder/stage/include' [-Werror,-Wmissing-include-dirs] ``` ## Technical Details `INTERFACE_INCLUDE_DIRS`/`INTERFACE_LINK_DIRS` leaked transitively through rocprofiler-sdk to consumers in other stages (e.g. composable_kernel, MIOpen), causing `-Wmissing-include-dirs` errors The trace decoder appears to be a runtime-only dependency (loaded by `rocprofv3 --att`), so it needs no compile-time interface dirs. This decouples it from the rocprofiler-sdk subproject dependency chain and instead lists it directly in the rocprofiler-sdk artifact `SUBPROJECT_DEPS` for packaging. ## Test Plan * Multi-arch * Passed multi-arch CI builds: https://github.com/ROCm/TheRock/actions/runs/23157787783 * This might later fail some tests (including after ROCm itself, in pytorch), I haven't fully tested that * Non-multi-arch * CI on this PR itself (build + test) ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Marius Brehler <marius.brehler@amd.com>
Motivation
Fixes breakage by PR: #3723
The build infra requires DIST_AMDGPU_TARGETS even if the list isn't used by the underlying project.
Technical Details
Test Plan
Test Result
Submission Checklist