Skip to content

Conversation

@bstefanuk
Copy link
Contributor

@bstefanuk bstefanuk commented Oct 17, 2025

Motivation

Performance regressions are found when adding --no-enumerate to the TensileCreateLibrary build. This PR re-implements the kernel ISA reliance from #2094 without needing to change logic files.

Technical Details

  • Rely only on the kernel's ISA during the build phase.
  • Add additional ISA enforcement given architecture details extracted from logic files.

Test Plan

  • Local performance testing for specific sizes
  • Comprehensive performance testing through gemmaiperf
  • Standard CI testing

Test Result

  • See CI results in this PR for standard pipeline checks.
  • Performance: tested on 6665 sizes using rocblas-bench on gfx950 (results below)
  • Performance: select sizes were evaluated on gfx942 and confirmed no performance change beyond +/-1%

Single precision NN

Stat Result
Average (% speed up) 0.50
Median (% speed up) 0.01
Count Faster 3482
Count Slower 3161

Single precision TN

Stat Result
Average (% speed up) 4.17
Median (%speed up) -0.02
Count Faster 3042
Count Slower 3579

Complex double precision TN

Stat Result
Average (% speed up) 0.18
Median (% speed up) 0.04
Count Faster 4452
Count Slower 2207

Submission Checklist

@codecov-commenter
Copy link

codecov-commenter commented Oct 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

❗ There is a different number of reports uploaded between BASE (5f34bda) and HEAD (ebb8f13). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (5f34bda) HEAD (ebb8f13)
hipSPARSE 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #2162       +/-   ##
============================================
- Coverage    88.75%   67.19%   -21.55%     
============================================
  Files          301      362       +61     
  Lines        25607    50705    +25098     
  Branches         0     5708     +5708     
============================================
+ Hits         22725    34069    +11344     
- Misses        2882    13052    +10170     
- Partials         0     3584     +3584     
Flag Coverage Δ
hipSPARSE ?
rocBLAS 67.19% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 663 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bstefanuk bstefanuk requested a review from a team as a code owner October 30, 2025 17:13
@bstefanuk
Copy link
Contributor Author

bstefanuk commented Nov 6, 2025

Testing failure assessment:

  • Failed tensile precheckin jobs for gfx908 a gfx942 are also present on develop build 50

Using gardener override to merge.

@bstefanuk bstefanuk merged commit b75ec66 into ROCm:develop Nov 6, 2025
61 of 65 checks passed
@bstefanuk bstefanuk deleted the bug/tensile-build-with-no-enumerate2 branch November 6, 2025 18:42
assistant-librarian bot pushed a commit to ROCm/Tensile that referenced this pull request Nov 6, 2025
[rocblas][tensile] Use kernel ISA during build with
 enforcement (#2162)

## Motivation

Performance regressions are found when adding --no-enumerate to the
TensileCreateLibrary build. This PR re-implements the kernel ISA
reliance from #2094 without needing to change logic files.

## Technical Details

- Rely only on the kernel's ISA during the build phase.
- Add additional ISA enforcement given architecture details extracted
from logic files.

## Test Plan

- Local performance testing for specific sizes
- Comprehensive performance testing through gemmaiperf
- Standard CI testing

## Test Result

- See CI results in this PR for standard pipeline checks.
- Performance: tested on 6665 sizes using `rocblas-bench` on gfx950
(results below)
- Performance: select sizes were evaluated on gfx942 and confirmed no
performance change beyond +/-1%

`Single precision NN`

Stat | Result
-- | --
Average (% speed up) | 0.50
Median  (% speed up) | 0.01
Count Faster | 3482
Count Slower | 3161

`Single precision TN`

Stat | Result
-- | --
Average (% speed up) | 4.17
Median (%speed up) | -0.02
Count Faster | 3042
Count Slower | 3579

`Complex double precision TN`

Stat | Result
-- | --
Average (% speed up) | 0.18
Median (% speed up) | 0.04
Count Faster | 4452
Count Slower | 2207

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
bstefanuk added a commit that referenced this pull request Nov 7, 2025
## Motivation

The patch
[0008-Revert-remove-options-no-enumerate-966.patch](https://github.com/ROCm/TheRock/blob/77e4a8304c0544a7ee5779fcabcee265a00f38ba/patches/amd-mainline/rocm-libraries/0008-Revert-remove-options-no-enumerate-966.patch)
can be removed now that `--no-enumerate` is no longer needed in tensile.
(PR #2162). This PR allows the patch to be removed without breaking CI.

## Technical Details

Use `rm -f` to allow the pipeline to continue even if the file is
missing.

## Test Plan

Low risk, standard CI testing is sufficient.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
bsyrowik added a commit to ROCm/TheRock that referenced this pull request Nov 7, 2025
## Motivation

Pick up rocWMMA changes for compatibility with TheRock build.

## Technical Details

Deleted a patch that is no longer required due to:
ROCm/rocm-libraries#2162

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
rponnuru5 pushed a commit to ROCm/TheRock that referenced this pull request Nov 7, 2025
## Motivation

Pick up rocWMMA changes for compatibility with TheRock build.

## Technical Details

Deleted a patch that is no longer required due to:
ROCm/rocm-libraries#2162

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
bstefanuk added a commit that referenced this pull request Nov 10, 2025
## Motivation

The `gfx1103` logic files used an incorrect reference to `navi33`, which
has been exposed due to the recent inclusion of new logic file
consistency checks in #2162.

## Technical Details

Update the schedule name in the rocblas logic files to properly map to
`gfx1103` which is consistent with the architectureMap in Common.py

## Test Plan

Local testing before and after this change shows the following outputs

Command:
```
Tensile/bin/TensileCreateLibrary [...] --architecture=gfx1103 /path/to/Tensile/Logic/asm_full build_gfx1103 HIP
```

Result (before; develop)
```
ValueError: Architecture mismatch: gfx1103 does not match  navi33. Review the library logic file
```

Result (after)
```
# Reading logic files: 32 thread(s), 144 tasks .............................. 100.0% (took 4.0 secs)
# Generating kernels: 32 thread(s), 2690 tasks .............................. 100.0% (took 23.1 secs)
# Compiling source kernels .............................. 100.0% (took 0.0 secs)
```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants