Take into account thread-local mode in checks for valid capture#2177
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the StreamCaptureOngoing function to properly handle thread-local capture mode when validating stream captures, addressing hip-issue-3876. The changes aim to respect capture mode boundaries (Global, ThreadLocal, Relaxed) when determining if operations on a stream should be allowed or if captures should be invalidated.
- Restructured the capture validation logic to check both stream capture status and current thread's capture mode
- Removed a
getStreamPerThreadcall fromhipStreamBeginCapture_commonthat may have been redundant - Added clearer code structure with early returns and a helper lambda for stream invalidation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| projects/clr/hipamd/src/hip_stream.cpp | Refactored StreamCaptureOngoing to properly handle thread-local vs global capture modes, with improved logic flow and clearer handling of different capture status/mode combinations |
| projects/clr/hipamd/src/hip_graph.cpp | Removed getStreamPerThread call from hipStreamBeginCapture_common, potentially for consistency with the spt/non-spt API pattern |
Comments suppressed due to low confidence (1)
projects/clr/hipamd/src/hip_graph.cpp:1116
- Potential inconsistency: The
getStreamPerThread(stream)call was removed, but other similar_commonfunctions likehipStreamIsCapturing_common(line 1068) still have this call.
Without this call, if hipStreamBeginCapture (non-spt variant) is called with hipStreamPerThread as the stream parameter, the stream won't be converted to the actual per-thread stream. It will pass the nullptr/legacy check (line 1110) and then be incorrectly cast to hip::Stream* at line 1116.
Consider either:
- Adding back
getStreamPerThread(stream)at the beginning of this function for consistency and safety - Adding an explicit check for
hipStreamPerThreadand returning an appropriate error - Documenting why this function should never receive
hipStreamPerThreadand ensuring all call sites respect this
hipError_t hipStreamBeginCapture_common(hipStream_t stream, hipStreamCaptureMode mode,
hipGraph_t graph = nullptr) {
// capture cannot be initiated on legacy stream
if (stream == nullptr || stream == hipStreamLegacy) {
return hipErrorStreamCaptureUnsupported;
}
if (mode < hipStreamCaptureModeGlobal || mode > hipStreamCaptureModeRelaxed) {
return hipErrorInvalidValue;
}
hip::Stream* s = reinterpret_cast<hip::Stream*>(stream);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
95d20a3 to
454f018
Compare
087dafe to
9a8fb67
Compare
1a1bf74 to
071d075
Compare
d945129 to
fe7582d
Compare
fe7582d to
356fc59
Compare
…s for valid capture
356fc59 to
d74e9c5
Compare
in checks for valid capture (#2177) [rocm-systems] ROCm/rocm-systems#2177 (commit 59aa56a)
…nal-merge-new-729 (#459) * SWDEV-549518 - Enable logging dynamically through HIP APIS. (#1079) * SWDEV-549518 - Enable logging dynamically through HIP APIS. * SWDEV-549518 - Adding ROCProfiler related new API changes. * rocprofiler-sdk changes for hip api additions. --------- Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com> Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com> * [rocprof-compute] Pin ruff version for consistent formatting (#2680) * pin ruff versions each to current latest * Update rocprofiler-compute-formatting.yml * Downgrade .pre-commit-config.yaml to match develop * Fix Python Formatting (#2679) Updated version of black to 26.1.0 updated some formatting rules Signed-off-by: David Galiffi <David.Galiffi@amd.com> * Add automatic PyTorch library discovery for Python applications (#2623) * Add automatic PyTorch library discovery for Python applications (#2623) * Add notice for the newly deprecated env variables (#2690) * [rocprofiler-compute] Use TheRock nightly builds in testing container (#2661) * Use TheRock nightly builds in testing container * Add HIP_DEVICE_LIB_PATH env var for hipcc to work * Add HIP_PLATFORM env var for cmake hip package * Add tarball placeholder * Add -f to curl command to fail on HTTP error * [rocprofiler-compute] Fix kernel/dispatch filtering (#2479) * Fix kernel/dispatch fitlering in GUI * Disallow --kernel and --dispatch filtering in analyze --gui mode since GUI frontend offers dropdown menu for kernel and dispatch filtering * Update CHANGELOG and documentation * Gracefully handle N/A values * Ensure workload path is valid before using it in GUI * Ignore kernel filters if dispatch filters provided * Add documentation for dispatch filtering overriding kernel filtering * Fix typo * Fix documentation * remove unnecessary whitespace * Address review comments * Allow kernel/dispatch filtering with --gui * Address review comments * Address review comments * Update CHANGELOG * Fix formatting * [rocprofiler-systems] Add build option for "examples" to specify gfx-arch (#2626) ## Motivation - Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex. - Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake` - Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family. - Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has. - GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported. - OMPVV offload tests now only compile if `amdflang` version is `>= 20` - Improve link time by reducing the number of GFX targets that binaries need to support. - RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against. * Reset HIP_VERSION_PATCH to 0 (#2590) * [SWDEV-559965] Update Changelog for power cap type (#2647) * [SWDEV-559965] Update Changelog for amd-smi set --power-cap Updated Changelog to mention flexible argument ordering for power cap type in amdsmi power cap set. Corrected Changelog documentation on PPT1 reset power_cap command. Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com> Signed-off-by: Maisam Arif <Maisam.Arif@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com> * [rocprofiler-system]: Enable UCX Communication API tracing (#2306) ## Motivation Enable UCX communication tracing and communication metadata ## Technical Details Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns. - Implements function interception for UCX functions across multiple categories using gotcha component. - Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure. - Added ROCPROFSYS_USE_UCX configuration option (enabled by default). - Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found. - Updated all Dockerfiles to include UCX dependencies. * hip-issue-3876 : Take into account thread-local capture mode in checks for valid capture (#2177) * Revert "rocr: Switch back to legacy IPC (#1744)" (#2676) This reverts commit 7e4b622. * SWDEV-558849 - Add support for static linking with ROCR (#2659) * [rocprofiler-systems] Fix MPI recv_data calculation (#2694) Fix incorrect `mpi_recv` calculation. It was using `_send_size` instead of `_recv_size` for `mpi_recv`. * [ROCmInfo] docs: mono-repo changes and style edits (#2584) * initial edits * mono repo related updates * standardize component name * style edits * more edits * Update CODEOWNERS (#2705): add /project/amdsmi owner * Use size_t datatype for global dimensions. (#2604) * rocrtst: set HSA_ENABLE_INTERRUPT after TestExample creation (#2687) Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Co-authored-by: cfreeamd <166262151+cfreeamd@users.noreply.github.com> * Rework clock based unit tests (#2646) * Update hip_api_trace.cpp for HIP_RUNTIME_API_TABLE_STEP_VERSION 22 * Update hip_api_trace.hpp for HIP_RUNTIME_API_TABLE_STEP_VERSION 22 --------- Signed-off-by: David Galiffi <David.Galiffi@amd.com> Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com> Signed-off-by: Maisam Arif <Maisam.Arif@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Co-authored-by: Karthik Jayaprakash <54370791+kjayapra-amd@users.noreply.github.com> Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com> Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com> Co-authored-by: jamessiddeley-amd <James.Siddeley@amd.com> Co-authored-by: David Galiffi <David.Galiffi@amd.com> Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com> Co-authored-by: marantic-amd <marantic@amd.com> Co-authored-by: vedithal-amd <Vignesh.Edithal@amd.com> Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com> Co-authored-by: Rakesh Roy <137397847+rakesroy@users.noreply.github.com> Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com> Co-authored-by: Sajina PK <sputhala@amd.com> Co-authored-by: Ioannis Assiouras <38722728+iassiour@users.noreply.github.com> Co-authored-by: Alysa Liu <Alysa.Liu@amd.com> Co-authored-by: German Andryeyev <56892148+gandryey@users.noreply.github.com> Co-authored-by: yugang-amd <yugang.wang@amd.com> Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com> Co-authored-by: hongkzha-amd <Hongkun.Zhang@amd.com> Co-authored-by: cfreeamd <166262151+cfreeamd@users.noreply.github.com> Co-authored-by: Jatin Chaudhary <51944368+cjatin@users.noreply.github.com> Co-authored-by: kjayapra-amd <karthik.jayaprakash@amd.com> Co-authored-by: hsivasun_amdeng <haresh.sivasuntharampillai@amd.com> Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Motivation
This PR addresses ROCm/hip#3876
Technical Details
In the case where one thread captures a stream in global capture mode, an action like hipEventQuery on a different thread for a different stream under thread-local capture mode currently returns hipErrorStreamCaptureUnsupported.
However In thread-local capture mode sequences in other threads should be ignored.
JIRA ID
Test Plan
Test Result
Submission Checklist