Skip to content

Integrate AOCL-BLAS as CPU reference BLAS for TheRock CI (blocked on AOCL team)#3314

Closed
tony-davis wants to merge 33 commits into
mainfrom
users/todavis/aocl-host-blas
Closed

Integrate AOCL-BLAS as CPU reference BLAS for TheRock CI (blocked on AOCL team)#3314
tony-davis wants to merge 33 commits into
mainfrom
users/todavis/aocl-host-blas

Conversation

@tony-davis
Copy link
Copy Markdown
Contributor

@tony-davis tony-davis commented Feb 9, 2026

Motivation

Related: AIROCBLAS-44

Integrate AOCL-BLAS (AMD Optimized C++ Library) into TheRock as the preferred CPU reference BLAS for rocBLAS testing. AOCL-BLAS provides AMD-optimized performance and better long-term alignment with AMD's ecosystem compared to OpenBLAS.

This PR builds AOCL 5.2 from source and provides it via CMake package config for consumption by rocBLAS clients.

Status: Ready to merge once AOCL team adopts our CMake modernization fixes (AIROCBLAS-43). Works with local AOCL forks that include the fixes.

Technical Details

1. AOCL-BLAS Build Integration

third-party/aocl/CMakeLists.txt (new):

  • Builds AOCL 5.2 BLAS component from source
  • Configures ILP64 support (64-bit integers for large matrices)
  • Builds static library (libaocl.a / aocl.lib)
  • Provides CMake package config via therock_cmake_subproject_provide_package()
  • Exports AOCL::AOCL target with library and include paths

third-party/aocl/artifact-aocl.toml (new):

  • Artifact configuration for TheRock build system
  • Fetches AOCL 5.2 source from AMD GitLab

2. Build Topology Changes

BUILD_TOPOLOGY.toml:

  • Added host-aocl-blas artifact (target-neutral, feature_group: HOST_MATH)
  • Added dependency: blas artifact depends on host-aocl-blas
  • AOCL-BLAS is built when THEROCK_ENABLE_HOST_MATH=ON (auto-enabled with THEROCK_BUILD_TESTING=ON)

CMakeLists.txt:

  • Auto-enable THEROCK_ENABLE_HOST_MATH when THEROCK_BUILD_TESTING=ON
  • Ensures CPU reference BLAS libraries are built for testing

3. rocBLAS Integration

math-libs/BLAS/CMakeLists.txt:

  • Added therock-aocl-blas as optional runtime dependency for rocBLAS (when testing enabled)
  • Set LINK_BLIS=ON when testing enabled (links AOCL-BLAS static library)
  • Pass -DBUILD_DIR to help rocBLAS clients locate dependencies

4. Test Infrastructure

build_tools/github_actions/test_executable_scripts/test_rocblas.py:

  • Set OMP_NUM_THREADS=48 to prevent AOCL-BLAS oversubscription (60-100x slowdown)
  • Pass environment variables to test subprocess

build_tools/validate_static_library.py (new):

  • Validation script for static libraries (uses ar t to verify archive)
  • Prevents packaging empty/corrupt archives

cmake/therock_testing.cmake:

  • Added therock_test_validate_static_lib() function
  • Validates static libraries during CTest

5. Windows Build Support

cmake/therock_subproject.cmake:

  • Preserve critical MSVC and Windows SDK environment variables in nested CMake builds
  • Required for AOCL's CMake to find compiler and Windows SDK
  • Handles variables: WindowsSdkDir, VCToolsInstallDir, PATH, INCLUDE, LIB, etc.
  • Escapes semicolons and strips trailing backslashes to prevent quote-escaping bugs

6. Third-Party Integration

third-party/CMakeLists.txt:

  • Add AOCL subdirectory when THEROCK_ENABLE_HOST_AOCL_BLAS=ON
  • Auto-enabled by THEROCK_ENABLE_HOST_MATH

Dependencies

Blocked on:

  • AIROCBLAS-43: AOCL team must merge CMake modernization fixes to upstream repo
  • rocm-libraries#4439: rocBLAS clients need AOCL CMake package detection (ready, same blocker)

Works with:

  • Local AOCL forks with CMake fixes applied
  • Ready to switch to upstream AOCL once team adopts changes

Test Plan

Tested with local AOCL builds (CMake fixes applied in fork):

  • AOCL-BLAS builds successfully on Linux
  • rocBLAS clients find AOCL via CMake package config
  • Static library validation passes
  • TheRock CI test infrastructure ready

Windows build support present but back-burnered (Linux priority).

Test Result

Local validation complete:

  • AOCL static library builds correctly (libaocl.a contains BLAS/CBLAS symbols)
  • CMake package config exports AOCL::AOCL target with correct paths
  • rocBLAS clients link against AOCL successfully

CI testing blocked until AOCL team merges upstream changes.

Submission Checklist

tony-davis and others added 26 commits January 22, 2026 18:02
…ting

Adds AOCL-BLAS 5.2 (BLIS/BLAS/CBLAS) as an alternative to OpenBLAS for
rocBLAS client testing. AOCL-BLAS provides ILP64 support needed for
large-scale stress tests and serves as a complementary CPU BLAS provider.

- Add host-aocl-blas artifact (Linux-only, enabled with THEROCK_BUILD_TESTING)
- Build AOCL 5.2 BLAS component from source with ILP64 and multithreading
- Install to lib/host-math/ alongside OpenBLAS
- Enable LINK_BLIS in rocBLAS when testing is enabled
- Add therock_test_validate_static_lib() for static library validation
AOCL's build system ignores CMAKE_INSTALL_INCLUDEDIR for headers,
causing them to install to dist/include/ instead of the expected
dist/lib/host-math/include/ location. This commit adds a custom
CMake command to copy headers to the correct location after staging.

Also fixes dependency ordering by moving therock-aocl-blas from
RUNTIME_DEPS to BUILD_DEPS for rocBLAS, since it's a static library
that must be built before rocBLAS links against it.

Changes:
- Add custom command to copy AOCL headers after staging
- Set CMAKE_INSTALL_LIBDIR/BINDIR/INCLUDEDIR for AOCL build
- Move therock-aocl-blas to BUILD_DEPS for proper build ordering
- Update library validation path to match new location
During rocBLAS builds, pip installs Tensile from the source directory,
creating egg-info and build artifacts that can cause permission errors
on subsequent builds. Add automatic cleanup of these artifacts during
rocBLAS+expunge to prevent build failures.

Cleans:
- Tensile.egg-info/ - Package metadata
- build/ - In-tree build directory
- dist/ - Distribution directory
- .eggs/ - Egg cache directory
…ting

Adds AOCL-BLAS 5.2 (BLIS/BLAS/CBLAS) as an alternative to OpenBLAS for
rocBLAS client testing. AOCL-BLAS provides ILP64 support needed for
large-scale stress tests and serves as a complementary CPU BLAS provider.

- Add host-aocl-blas artifact (Linux-only, enabled with THEROCK_BUILD_TESTING)
- Build AOCL 5.2 BLAS component from source with ILP64 and multithreading
- Install to lib/host-math/ alongside OpenBLAS
- Enable LINK_BLIS in rocBLAS when testing is enabled
- Add therock_test_validate_static_lib() for static library validation
AOCL headers now install to lib/host-math/include/aocl/ subdirectory,
matching the OpenBLAS pattern (lib/host-math/include/openblas/).

Changes:
- Replace broken custom_command/custom_target with install(CODE) script
- Headers copied during install phase, not as separate build step
- Updated CMake package config to point to include/aocl/ subdirectory
- Preserves directory structure for Au/, Capi/, alci/ subdirectories

Verified:
- rocBLAS configure finds headers successfully
- rocblas-test and rocblas-bench link against AOCL
- Binaries show "Using reference library .../libaocl.a"
Pre-commit hook fixes:
- Remove trailing whitespace from aocl-config.cmake.in
- Apply black formatting to validate_static_library.py
Fixes all 8 issues raised by Copilot PR review:

1. validate_static_library.py: Filter empty strings when counting object files
2. validate_static_library.py: Fail validation if archive is empty (0 objects) or 0 MB
3. therock_testing.cmake: Skip static lib validation in sanitizer builds (matches shared lib)
4. aocl-config.cmake.in: Clarify OpenMP is needed for multithreaded AOCL-BLAS
5. aocl/CMakeLists.txt: Replace local path with public GitHub URL reference
6. BLAS/CMakeLists.txt: Add Tensile cleanup to clean target (not just expunge)
7. aocl/CMakeLists.txt: Clarify OpenMP comment - AOCL manages discovery internally
8. aocl/CMakeLists.txt: Add verification for install-time GLOB, fail if no headers found
The +clean target isn't available at this point in CMakeLists.txt.
Only +expunge is explicitly created and can be used as a dependency.

Fixes CMake configuration error:
  Cannot add target-level dependencies to non-existent target "rocBLAS+clean"
- Add validation for critical AOCL headers (blis.h, cblas.h, blis64.h) in install script
- Remove unused @PACKAGE_INIT@ from aocl-config.cmake.in to match OpenBLAS pattern
- Note: rocBLAS+expunge-tensile already only depends on rocBLAS+expunge (not +clean)
Create a symlink from libaocl.a to libcblas.a so that rocBLAS's
find_library(NAMES cblas ...) can discover TheRock's AOCL through
the hinted library paths. This allows the develop branch of
rocm-libraries to work without modification.

This is a temporary workaround until rocBLAS is updated to use
find_package(aocl) for CMake package discovery.
AOCL-BLAS is disabled on Windows (disable_platforms = ["windows"])
so we should not add it as a dependency for rocBLAS on Windows.

Windows builds will continue to use only OpenBLAS as the CPU
reference BLAS for rocBLAS clients/tests.
Instead of symlinks, copy AOCL library and headers to the location
that rocBLAS's develop branch searches:
  ${BUILD_DIR}/deps/aocl/install_package/

This allows rocBLAS to find TheRock's AOCL 5.2 without any changes
to rocm-libraries, matching the existing search logic in
rocblas/clients/CMakeLists.txt.

Benefits:
- No symlinks (better Windows compatibility)
- No changes needed to rocm-libraries develop branch
- Works with existing rocBLAS AOCL discovery logic

Removes the previous libcblas.a symlink approach which didn't
handle headers.
Move the copy logic from rocBLAS CMakeLists (configure-time) to
AOCL CMakeLists (install-time). This ensures the copy happens
after AOCL builds but before rocBLAS configures, making it work
in a single build pass.

Timeline now:
1. Top-level configure
2. Build phase:
   - AOCL builds and installs (copies to rocBLAS deps/ location)
   - rocBLAS configures (finds AOCL in deps/ location) ✓
   - rocBLAS builds

This matches rocBLAS's existing search logic in
rocblas/clients/CMakeLists.txt without requiring any changes
to rocm-libraries.
Automatically set THEROCK_ENABLE_HOST_MATH=ON when THEROCK_BUILD_TESTING=ON
to ensure host math libraries (OpenBLAS, AOCL-BLAS, SuiteSparse) are built
for rocBLAS clients and tests.

Without this, AOCL-BLAS was declared but never built on CI because its
feature group (HOST_MATH) was disabled by default, causing rocBLAS
configure to fail with "Could not find any BLAS library".

Fixes CI failure where rocBLAS clients couldn't find BLAS library.
Pass -DBUILD_DIR to rocBLAS CMake configuration, pointing to its build
directory. This custom variable is used by rocBLAS clients/CMakeLists.txt
to locate bundled dependencies in ${BUILD_DIR}/deps/.

Without this, rocBLAS couldn't find TheRock's AOCL at
${BUILD_DIR}/deps/aocl/install_package/ and would fall back to system
AOCL installations, even though the files were correctly copied there
during AOCL's install phase.

Now rocBLAS will find: build/math-libs/BLAS/rocBLAS/build/deps/aocl/...
AOCL's CMakeLists.txt defaults CMAKE_CONFIGURATION_TYPES to Debug if
not explicitly set. Even though we pass CMAKE_BUILD_TYPE=Release,
the Debug configuration type was causing BLIS to build without
optimizations, resulting in 100-450x slowdowns on triangular
operations (trsm, trmm) and test timeouts.

Add -DCMAKE_CONFIGURATION_TYPES=Release to explicitly force Release
mode for all AOCL components.

This fixes CI test timeouts where rocBLAS smoke tests failed to
complete in 15 minutes due to Debug-mode AOCL performance.
CI diagnostics revealed we cannot detect actual CPU allocation:
- multiprocessing.cpu_count() returns 256 (all system cores)
- SLURM variables not available
- cgroup limits not detectable via standard paths

Without this fix, rocBLAS sets 254 OpenMP threads on containers
with only ~48-64 allocated cores, causing 60-100x AOCL performance
degradation due to thread oversubscription.

Conservative value of 48 threads assumes typical CI allocation of
50-64 cores, leaving headroom for system threads as recommended
by AOCL team (use allocated_cores - 4).

This should reduce rocBLAS test time from ~8.9 min to ~2.5 min.
…vention

Follow CMake best practices for package naming and target namespaces:
- Change package name from 'aocl' to 'AOCL' (matches BLAS/LAPACK pattern)
- Change target from 'AOCL::aocl' to 'AOCL::AOCL' (namespace matches package)
- Rename config file to 'AOCLConfig.cmake' (standard for uppercase packages)
- Update install path to 'cmake/AOCL'

This follows Kitware's recommendation that package names and target
namespaces should match exactly, which will be enforced in CMake 3.31+
via Common Package Specification (CPS).

References:
- https://www.kitware.com/psa-your-package-name-and-target-namespace-should-match/
- Standard CMake modules: BLAS::BLAS, LAPACK::LAPACK, ZLIB::ZLIB
This commit enables AOCL-BLAS to build on Windows alongside Linux,
supporting the AOCL team's deliverable improvements.

Key changes:
- BUILD_TOPOLOGY: Remove Windows platform restriction for AOCL
- therock_subproject: Preserve Windows SDK environment for nested CMake
- BLAS/CMakeLists: Enable AOCL-BLAS for rocBLAS testing on Windows
- aocl/CMakeLists: Major refactoring for cross-platform support
  - Use tony-davis fork with Windows/CMake fixes (temporary)
  - Add VS Clang-CL toolchain support for Windows
  - Cross-platform OpenMP configuration (MSVC vs GCC flags)
  - Disable AOCL_UTILS on Windows (MSVC incompatibility)
  - Modern CMake paths with GNUInstallDirs
  - Platform-specific library naming (aocl.lib vs libaocl.a)
  - Remove custom package config (AOCL now provides its own)

Result: AOCL-BLAS builds successfully on both Windows and Linux,
enabling rocBLAS testing with CPU reference BLAS on Windows.

Co-authored-by: Cursor <cursoragent@cursor.com>
[RIPE FOR OWN PR]

After amdsmi was moved from base/ to core/ (d3bb45a), it is only built
when THEROCK_ENABLE_CORE_AMDSMI=ON. math-libs/BLAS/CMakeLists.txt still
added the amdsmi target to optional deps on non-Windows unconditionally,
so configure failed with "non-existent target amdsmi" when building
rocBLAS/hipBLAS/hipBLASLt without CORE_AMDSMI.

rocBLAS and hipBLASLt treat amdsmi as optional (GPU monitoring); they
can build and run without it. Only add amdsmi to the optional deps when
THEROCK_ENABLE_CORE_AMDSMI is ON. rocm_smi_lib remains in the list when
not Windows (that target is always present).

Co-authored-by: Cursor <cursoragent@cursor.com>
tony-davis and others added 2 commits February 9, 2026 18:46
…LAS dep

- third-party/aocl: Use SOURCE_DIR for fetch layout; install to lib/host-math/
  with ENABLE_AOCL_UTILS=OFF; provide_package at lib/host-math/lib/cmake/AOCL.
  Remove legacy install(CODE) workaround; downstream uses find_package(AOCL CONFIG).
- therock_subproject: Resolve build deps from THEROCK_STAGE_DIR (not DIST_DIR)
  so subprojects find package configs from each dep's stage install tree (fixes
  AOCL and other stage-installed packages).
- math-libs/BLAS: Treat AOCL-BLAS as runtime-only optional dep for rocBLAS
  (mirror OpenBLAS); remove rocBLAS build dep on therock-aocl-blas.

Co-authored-by: Cursor <cursoragent@cursor.com>
tony-davis and others added 5 commits February 9, 2026 19:43
Reduce test_rocblas.py to minimal AOCL change (cap OMP_NUM_THREADS=48) and single log line; remove CPU allocation diagnostics block.


skip-checks: true
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep Windows env preservation block only; resolve build deps from dist again to avoid global behavior change.


skip-checks: true
Co-authored-by: Cursor <cursoragent@cursor.com>
Drop rocBLAS+expunge-tensile; unrelated to AOCL integration.


skip-checks: true
Co-authored-by: Cursor <cursoragent@cursor.com>
- base: point amdsmi package to lib/cmake/amd_smi to match install layout
- core: add THEROCK_ENABLE_CORE_AMDSMI block (amdsmi from rocm-systems, core-amdsmi artifact)
- core: make amdsmi an optional build dep for rocrtst when TARGET amdsmi exists

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolved conflicts:
- BUILD_TOPOLOGY.toml: keep core-amdsmi from main, add host-aocl-blas
- base/CMakeLists.txt: take main (amdsmi in core only)
- math-libs/BLAS/CMakeLists.txt: main's optional rocm_smi_lib + our AOCL/host-blas testing deps

Co-authored-by: Cursor <cursoragent@cursor.com>
@tony-davis tony-davis changed the title Users/todavis/aocl host blas Integrate AOCL-BLAS as CPU reference BLAS for TheRock CI (blocked on AOCL team) Feb 19, 2026
jayhawk-commits pushed a commit that referenced this pull request Mar 11, 2026
## Motivation

Bump rocm-systems from 93bc019 to 093b66c (includes fix for hip-tests
issue and revert for mathlib hiprtc issues and revert for rccl-test,
added revert for miopen failures due to PR 653):

Commits:
093b66c (HEAD, origin/develop, origin/HEAD) Revert "SWDEV-546177 -
hipModuleGetLoadingMode API impl (#653)" (#3858)
d8a0adb [AMD-SMI] Hide libamd_smi.so internal symbols (#3777)
d4da458 [rocprofiler-sdk] [Documentation ] Updating changelog (#3827)
19fadeb (origin/users/abchoudh/fix_dispatch_count) [RCCL][Tuner
Plugin] Enable tuning of RCCL tuning constants (#3757)
b4f5f8a rocr: Fix IPC dmabuf hang with large allocations (#3211)
64efea0 RCCL: allow users to override max and per job memory & fix
defaults. (#3797)
9b3dd10 Removing ready_for_review (#3849)
7e43880 [rocprofiler-systems] Update ROCm version to 7.2.0 in CI
workflows for Debian, RedHat, and Ubuntu (#3431)
1fdb6b9 [rocshmem] add gda/topology unit tests (#3715)
be1ea24 Move hipMipmappedArrayGetMemoryRequirements test to common
tests
e4513f0 Update amdgpu-windows-interop with latest changes, pal
58aa0bab2ced0cc9ebe8d2d0932db6774feb4e49 2026-03-04(#3773)
b1f964d [rocprofiler-compute] Ensure long kernel name fully shows in
compute analyze (#3665)
4dcf1e3 SWDEV-567112 - Replace test names (#3787)
33f5f30 ROCM-2428 - fixes hipStreamBatchMemOp invalid operation
checks (#3099)
139f4bf [SWDEV-556456] Align HIP_UUID with rocminfo (#3614)
8e89285 Reduce buffers alignment to 4 bytes (#3821)
51be29a AIRUNTIME-125: Consolidate Windows optimization and debug
flags (#3825)
1407392 [AMD-SMI] CI: Fix root workflow to use ASIC-specific test
filters (#3807)
63f78a9 (origin/users/mcao/fix_rocpdsummary) [ROCM-SMI] Fix DRM
include dirs leaking absolute build paths to consumers (#3808)
caf2f7e [ROCM-186] amd-smi: Add support for a VRAM and GTT tuning
interface (#3636)
a0712d4 [TheRock CI] Update projects_to_test lists (#3749)
02090c4 rocrtst: install gfx .hsaco files to share/rocrtst (#3744)
4a0a1cb Merge other simd table (#3696)
0d07657 Add missing kwargs from
rocprofiler_add_integration_validate_test in .cmake-format.yaml (#2336)
3a3df30 Optimize device counting service GPU interactions (#1583)
95d9da0 Add SPM Enable flag in build infrastructure (#3677)
12bb943 [rocprofiler-sdk] On-demand GPU profile queue
creation/destruction (#3586)
941057c  Navi4 tuning table iter 1 (#3052)
dbf2b73 [AMD-SMI] Display N/A for cu_occupancy when file is
unavailable (#3589)
b0efc7c [RCCL] [UT] Add ROCTX test (#3625)
ba7a20e Reducing the p2pnChannels for half-subscription A2A on
multi-node MI350 (#3381)
75238c9 [clr] Fix memory leak in getOrCreateHostcallBuffer (#3699)
af2ee0e [hip-tests] ASAN Check for image support before we create
context (#3834)
ad44966 Update windows ci subtree in include amdgpu-windows-interop
(#3814)
c8ad252 [rocprofiler-register] Fix compilation with system fmt/glog
(#1243)
7818815 Update README to include dbgapi and debug agent components
(#3731)
88e4a78 ROCProfiler and ROCTracer: Modifying deprecation note (#3831)
b5918a5 [ROCM-3124-3125-3126] CUID file generation hangs on MI350
systems/CUID test failures/Segmentation fault in CUID example code
(#3548)
97a5dd9 Update copyright to use SPDX IDs (#3805)
511730a [rocshmem]: add flood-amo tester (#3653)
2d650a0 [clr] Fix heap use after free error in device allocations
(#3789)
b6b179a Disable hipHostRegister_Negative test for ASAN (#3832)
39ec318 [RCCL] Add GDA alltoallv via rocshmem integration (#3613)
fb0f4d5 [RCCL] [CUMEM] Fix cuMem multi-process runs (#3811)
c3de7d4 SWDEV-526201 - Fix and enable disabled HIP tests from warp
group (#3089)
8d9a8ca roofline: code cleanup and refactor vector types (#3813)
8957e49 Don't wait on command completion if worker thread is
destroyed (#3790)
9e7586a [rocshmem] Add barrier APIs and expose `ROCSHMEM_TEAM_WORLD`
on device (#3651)
91b0923 Revert "fix local gpu release static build failure (#3667)"
(#3799)
0fda754 libhsakmt: Add secondary KFD context creation support
ee43db9 Revert "Update TheRock reference to 20260303 commit (#3709)"
(#3826)
86e28b9 Added fix to update GL2C counters instance count for GFX11.5
(#3100)
93f69f7 Adjust includes to match use (#3742)
e9fbc3f (develop) Update TheRock reference to 20260303 commit (#3709)
be0675a (HEAD) Revert "Support fp8 types in hiprtc (#2605)" (#3792)
3e3a94a [rocprofiler-systems] Add trace_cache support for
std::optional<T> serialization (#3490)
0b42a7f clr: Eliminate unnecessary kernel name string copies (#3774)
b6b0d77 rocr: Add hsa_amd_memory_async_batch_copy API for batched
memory copies (#3259)
486e6d1 Resolve staircase RS regression with 48 max channels (#3684)
eb59c85 [gfx942][gfx950] Leverage new cache bypass builtins for
simple protocol where available (#2847)
4d74d27 (origin/users/raramakr/rocm-smi-target) Revert "Auto Labeler:
Add ci:regression-detection label to rccl PRs (#3543)" (#3769)
8f07955 [AMD-SMI] CI: Use ASIC-specific test blacklists in workflows
(#3775)
7cef5b6 Fix MFMA total FLOPS calculation (#3371)
aea3751 Remove duplicated tests (#3235)
b6c656f Remove duplicated tests in memory module (#3087)
ca3137d [rocprofiler-sdk] Install integration tests without building
for therock & Misc. fixes (#3047)
0ab5c41 [rdc] Enable on-demand queue mode in rocprofiler-sdk to
prevent inference degradation (#3629)
a1eb2a1 rocr/wsl: a library should not output to std::out by default
(#3718)
b7da296 Reenable flood_put/get testers on mlx5 since they should work
after pr2732 (#3748)
000e24d [rocprofiler-sdk] Add automatic late-start support to
rocprofiler_force_configure (#2168)
64ea87f [hip-tests] Fix memory leaks in hipMemPoolTrimTo tests
(#3643)
543a7d7 rocr: Include code object allocs in lightweight coredump
a58da37 [rocdecode] - update rocdecode ctest (#3768)
f88e4ee [rocprofiler-systems] Make CDash submit non-fatal and add
GitHub Actions logging (#3525)
cb14deb [rocprofiler-systems] Update nlohmann-json submodule (#3391)
4492530 SWDEV-567112 - Introduce new mechanism for tagging and
disabling tests - Part 2 (#3707)
8ca9913 disabling rccl from full build (linux), covered in RCCL CI
(#3770)
c4fdb20 [ATT] Re-enable tests. Add option to specify perf to target
CU only (#2819)
615aab9 ROCM-3816 Out of Memory fix (#3588)
8ffad41 Fix rocm_smi64 exporting invalid absolute paths to consumers
(#3717)
042d76a rocr: Remove dependency on KFD in Runtime::VMemoryHandleMap
(#2515)
555db59 [AMD-SMI] CPU: Added support for family 1A Models 50h-57h
(#3206)
3affa2c [SWDEV-555935] Fix shared mutex and self-heal (#3729)
ba0bf0f Replace hipMemGetInfo with ihipMemGetInfo and use it for
internal calls. (#2845)
c5cef9b Fix HIP_RETURN on all HIP API calls. (#2838)
241ce7b Revert "memory: fix "contiguous_bytes" calculation in generic
conversion (#3285)" (#3755)
8a690f4 [kpack/clr] Windows PE/COFF support for kpack artifact
splitting and runtime loading (#3728)
863bdf8 MFMA pre-processor guards for ipc.hip (#3724)
90bb9b1 Release queue outside of vgpusAccess lock (#3705)
de45239 clr: Add build support of ROCR and PAL backends together
(#3722)
dfb7abc [rocprofiler-sdk] RCCL API changes for
RCCL_API_TRACE_VERSION_PATCH = 3 (#3477)
d69d4f2 [AICOMRCCL-633] - Fixed warnings in tests (#3402)
067d86d rocr/wsl: Disable AQL Queue usage with flag ROCR_USE_PM4
(#3663)
594eb60 [TheRock CI] rocm-systems build full ROCm stack (#3182)
27d17e8 [ROCProfiler-SDK] Fix SWDEV-556922: Handle comments before
checking for pmc: (#1723)
c80d904 memory: fix "contiguous_bytes" calculation in generic
conversion (#3285)
669987c [hip-tests] ASAN - add missing release handles (#3735)
a24bbd7 fix local gpu release static build failure (#3667)
259b2ff Speed up DeviceId (#2803)
65d9264 Simplify MPI trace merge logic and remove legacy guards
(#3562)
1076c08 use system to look for zcat path instead (#3720)
22f1d19 [AICOMRCCL-355] Enable threshold-based p2p-batching (#3000)
a2e4c79 Partially flatten template tests cases (#2597)
e242abe Pass space separated gfx target list to RCCL build command
(#3701)
4f78aea SWDEV-570074 - Refactor Memset memory object handling.
(#2228)
b3ad12d Support Nvidia build on theRock for HIP-tests (#3335)
a1cf15e Support fp8 types in hiprtc (#2605)
8ef84b0 [rocprofiler-systems] Add HPC examples to automated testing
(#3437)
db3a70d Free memory which was allocated in tests (#3710)
27e6809 [rocprofiler-systems]: Fix rhel CI failure on for MPI and UCX
tests (#3700)
0d9aaf5 rccl/topo_expl: fix build issue. (#3719)
be04d75 Fix zcat path used for checking kernel configs (#3423)
cab60a7 rocr/thunk/win: Add CU mask support (#3518)
5b3d826 [CUMEM] Initial support for cuMem APIs (#2763)
0606ff4 [HIP] [PLAT-194496] Improve Stress_hipMalloc_HighSizeAlloc
reliability (#3550)
05750a7 fix hip-test name in config (#3716)
33f777f hsakmt: Remove --high functionality from run_kfdtest.sh
(#2486)
e4c46e3 Hide the retain under direct dispatch check (#3698)
bfe0ca0 Add rocprof trace decoder to CI tests (#3690)
a769b6f [rocSHMEM] Edgar/abstract allocator ipc part1 (#3411)
659fb52 [AMD-SMI] Fix bugs, improve error handling, and clean up
NIC/switch code (#3654)
0eb26ea hsakmt: Fix Import/Export of dmabuf_fd for WSL/Windows
(#3348)
a122936 [SWDEV-567812] Add UBB power and power_limit fields to
npm_info (#3262)
c3bec09 [rocprofiler-sdk][rocprofv3][rocpd] Updates for KFD data
(#340)
7c44d47 SWDEV-547659 - Remove HIP_VERSION_GITHASH in logs (#448)
74b6487 SWDEV-547008 - Documentation fix for function return values
(#463)
af21cd4 SWDEV-545553 - Improve clarity and robustness of CALLBACK
unit tests (#546)
180d639 SWDEV-544900 - Change hip-test test case name (#547)
feeca99 Doc improvements (#3688)
c1822b6 ROCprofiler-SDK: deprecation of legacy tools (#3609)
5d7aff8 Fix rocprof-compute-viewer link (#3459)
0b0b484 AIRUNTIME-129 - Fix Ocl test failures of 2D image with
pitches. (#3584)
ac569b8 Fix memory tests config (#3687)
603fe7a [hip-tests] Enable hipMipmappedArrayGetMemoryRequirements
test via cmake
4fad445 [hip] Docs: Updates to some memory management pages
8cc5955 AICOMRCCL-656 fix memory leak in ncclCommInitRankFunc (#3628)
94a4595 Fix missing amd_comgr linkage in pc-sampling integration test
(#3453)
2a68565 rocrtst: CMAke file: strip xnack/feature suffixes from gfxNum
in build_kernel (#3652)
c3542bf [rocprofv3] Deprecating input text files for counter
collection (#1562)
ff122e7 SWDEV-573073 - Cleanup hipHostAlloc/Malloc/Register tests
(#3017)
5b1deaf SWDEV-567112 - Introduce new mechanism for tagging and
disabling tests - Part 1 - Core (#2351)
6e0cc30 rocrtst: MaxSingleAllocationTest: skip CPU NUMA nodes >0
(#3208)
d65f601 [AICOMRCCL-667] rccl: Change GDR selection logic. (#3607)
f1c44ab Patch Back to Old Repo: fixes from manual runs (#3621)
fe53bcd [AMD-SMI] Allow amdsmi init to succeed when no NIC hardware
is present (#3403)
b25600e [ROCM SMI] Fix fw pldm version not displayed in default
amd-smi (#3594)
169d2ef root to module wiring, remove legacy source collection
(#3482)
7469781 [LRT][clr] SWDEV-512963-Fix CTS test failures for 1D buffer
copy (#3520)
c8f55d9 Adding rocprof trace decoder (#3576)
425e983 Trace decoder codeowners (#3600)
a176efd [hip-tests] Add return statements to HIP_SKIP_TEST (#3647)
32687cf rocrtst: CPUAccessToGPUMemoryTest: Cap host allocation to 512
MB under ASAN (#3407)
97c0206 Update codeowners for thunk DXG (#3334)
be44b28 [rocdecode][rocjpeg] - ctest CMakeLists cleanup (#3632)
80ff0b8 Various memory leak fixes in hip-tests (#3605)
0988f67 fix typo in help text (#3314)
9f823c5 Fix CUID file lookup by loading files before searching
entries (#3436)
064c892 SWDEV-546177 - hipModuleGetLoadingMode API impl (#653)
006213e ROCM-2696: Ignare size and base if null ptr (#3336)
6060b99 Improve atomic min max test perf (#2580)
3fbcc13 Change printf capture impl (#1127)
93bc019 (tag: hip-version_7.12.60610,
origin/users/mradosav-amd/rocprofsys-selective-region) [ROCM-CORE]
Update rdhc script to support rocm install prefix
(ROCm/rocm-systems#3596)

[AICOMRCCL-355]:
https://amd-hub.atlassian.net/browse/AICOMRCCL-355?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
@tony-davis tony-davis closed this May 13, 2026
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant