-
Notifications
You must be signed in to change notification settings - Fork 250
rocrtst: Disabled NUMA async copy build #3044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -371,6 +371,14 @@ aux_source_directory(${ROCRTST_ROOT}/suites/negative negativeSources) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| aux_source_directory(${ROCRTST_ROOT}/suites/stress stressSources) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| aux_source_directory(${ROCRTST_ROOT}/suites/test_common testCommonSources) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Optional: enable NUMA async copy test (memory_async_copy_numa.cc) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Force disabled by default to avoid building the NUMA test source. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| set(ENABLE_COPY_NUMA OFF CACHE BOOL "Build Memory_Async_Copy_NUMA test" FORCE) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Optional: enable hwloc membind nodeset call in NUMA test | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| set(ENABLE_COPY_NUMA_MEMBIND_NODESET OFF CACHE BOOL "Enable hwloc membind nodeset in NUMA test") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+375
to
+381
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Force disabled by default to avoid building the NUMA test source. | |
| set(ENABLE_COPY_NUMA OFF CACHE BOOL "Build Memory_Async_Copy_NUMA test" FORCE) | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| # Optional: enable hwloc membind nodeset call in NUMA test | |
| set(ENABLE_COPY_NUMA_MEMBIND_NODESET OFF CACHE BOOL "Enable hwloc membind nodeset in NUMA test") | |
| option(ENABLE_COPY_NUMA "Build Memory_Async_Copy_NUMA test" OFF) | |
| if(NOT ENABLE_COPY_NUMA) | |
| # Exclude NUMA test source when NUMA copy tests are disabled. | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| endif() | |
| # Optional: enable hwloc membind nodeset call in NUMA test | |
| option(ENABLE_COPY_NUMA_MEMBIND_NODESET "Enable hwloc membind nodeset in NUMA test" OFF) | |
| # Propagate NUMA options to C++ sources. | |
| if(ENABLE_COPY_NUMA) | |
| add_definitions(-DENABLE_COPY_NUMA=1) | |
| else() | |
| add_definitions(-DENABLE_COPY_NUMA=0) | |
| endif() | |
| if(ENABLE_COPY_NUMA_MEMBIND_NODESET) | |
| add_definitions(-DENABLE_COPY_NUMA_MEMBIND_NODESET=1) | |
| else() | |
| add_definitions(-DENABLE_COPY_NUMA_MEMBIND_NODESET=0) | |
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change the rpath settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With CMAKE_BUILD_WITH_INSTALL_RPATH ON, TheRock build system was complaining about runtime linker failures when testing from the build directory:
./build/rocrtst64: error while loading shared libraries: libnuma.so.1: cannot open shared object file: No such file or directory
- The binary was looking for libs at build/../lib/rocm_sysdeps/lib (install paths)
- But in TheRock's build tree, numa/hwloc are actually at dist/lib/rocm_sysdeps/lib
- Same for bundled thirdparty libs - not at lib/rocrtst yet, they are only copied there during install
With OFF -BUILD_RPATH is pointing to ../../dist/lib/rocm_sysdeps/lib and finds TheRock's libs
Also when they switch to INSTALL_RPATH after make install its able to finds installed libs at /opt/rocm/lib/rocrtst
- Seemed to have worked in both scenarios without needing to install first.
Let me know if you see any more concerns.. We did a lot of iterations on this PR.. And this seems to be the Goldilocks Cmakefile!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Around 2 years ago, there was a task group that spent a few weeks to research and refine the RPATH settings for all the components in ROCm. I was not part of that discussion, but there were people who were very experienced in packaging/deployment involved. The current RPATH settings were based on the recommendations from that group, so I am hesitant with making RPATH changes.
Given that these changes only apply to rocrtst, which is a executable and not a library, we are probably ok, so we can ignore get back to it if we start seeing problems from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment (and the one below) correct, and the message()? It looks like it's optional.
Does the message() print out whether ENABLE_COPY_NUMA is on or off?
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unconditionally excludes memory_async_copy_numa.cc regardless of ENABLE_COPY_NUMA, which makes the option ineffective and can lead to link/compile mismatches if someone tries to enable the test. Recommend: only list(FILTER ...) when NOT ENABLE_COPY_NUMA (and only print the STATUS message in that case).
| # Always exclude memory_async_copy_numa.cc since ENABLE_COPY_NUMA is forced OFF | |
| message(STATUS "ENABLE_COPY_NUMA is OFF - excluding memory_async_copy_numa.cc from build") | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| # Exclude memory_async_copy_numa.cc when ENABLE_COPY_NUMA is OFF | |
| if(NOT ENABLE_COPY_NUMA) | |
| message(STATUS "ENABLE_COPY_NUMA is OFF - excluding memory_async_copy_numa.cc from build") | |
| list(FILTER performanceSources EXCLUDE REGEX ".*memory_async_copy_numa\\.cc$") | |
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with copilot's comment--it looks like the source file is unconditionally excluded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to exclude the .cc file. I would just have
ENABLE_COPY_NUMA_MEMBIND_NODESET and ENABLE_COPY_NUMA set to 0,
But I do not feel strongly about it. Up to you..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right.. its already a disabled test. I will work on these review comments in the next PR. I will also clean and eliminate some redundant code in this CMakeFile file as well.
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When NUMA is not found you only emit a warning, but the build still compiles suites/performance/memory_async_copy*.cc, which include <numa.h> and call libnuma APIs. On systems without libnuma headers this will fail at compile time. Either make NUMA a required dependency (FATAL_ERROR with install hint) or keep the previous behavior of filtering out the memory_async_copy*.cc sources when NUMA isn’t available.
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hwloc pkg-config path only succeeds if find_library(... NO_DEFAULT_PATH HINTS ${HWLOC_PC_LIBRARY_DIRS}) finds a full .so file. If HWLOC_PC_LIBRARY_DIRS is empty (common when the lib is in a default system path) or the .so symlink isn’t present, HAVE_HWLOC stays FALSE and the later fallback searches only ${ROCRTST_ROOT}/thirdparty/lib (also NO_DEFAULT_PATH), leading to a hard build failure even when system hwloc is installed. Suggest adding an INTERFACE imported target fallback using ${HWLOC_PC_LIBRARIES}/${HWLOC_PC_LIBRARY_DIRS} (like the NUMA logic) and/or dropping NO_DEFAULT_PATH so system paths are searched.
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INSTALL_RPATH/BUILD_RPATH paths look inconsistent with what this CMakeLists installs: install(DIRECTORY .../thirdparty/lib DESTINATION lib/rocrtst) places libhwloc.so.5 under $prefix/lib/rocrtst/, but the new RPATH points to $ORIGIN/../lib/rocrtst/lib (extra /lib). This will prevent the installed binary from finding the bundled hwloc when the fallback is used. Recommend adjusting the RPATH to include $ORIGIN/../lib/rocrtst (and, if needed, $ORIGIN/../lib/rocrtst/thirdparty/lib), or aligning the install destination with the RPATH.
| INSTALL_RPATH "$ORIGIN/../lib/rocrtst/lib;$ORIGIN/../lib/rocm_sysdeps/lib" | |
| BUILD_RPATH "$ORIGIN/../../dist/lib/rocm_sysdeps/lib;$ORIGIN/../lib/rocrtst/lib" | |
| INSTALL_RPATH "$ORIGIN/../lib/rocrtst;$ORIGIN/../lib/rocm_sysdeps/lib" | |
| BUILD_RPATH "$ORIGIN/../../dist/lib/rocm_sysdeps/lib;$ORIGIN/../lib/rocrtst" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message in the
#elsebranch is misleading: this branch is selected whenENABLE_COPY_NUMA_MEMBIND_NODESETis disabled at compile time, not necessarily because the linked hwloc lacks nodeset membind support. Recommend rewording to reflect the build-time option (or detect support at runtime and print a message based on the detection result).