Skip to content

Conversation

@marbre
Copy link
Member

@marbre marbre commented Sep 11, 2025

This reverts commit 68a380c.

#966 (comment)

The root cause makes no sense and the lack of any reproduction information makes it just lore that we will carry forward indefinitely (and if true, could be hiding a serious problem).

@marbre marbre requested a review from a team as a code owner September 11, 2025 23:54
@TorreZuk
Copy link
Contributor

Don't reintroduce this now unless you first run some perf regression checks, this was recently reverted so ticket and @NaveenElumalaiAMD can be consulted.

@stellaraccident
Copy link
Contributor

We will revert in three business days as the justification makes no sense, and we can't be carrying things like this indefinitely that lack a root cause or any theory of why there is an impact. If the project maintainers disagree, then I would suggest that more analysis is done and/or some automated test put in place which verifies the expected behavior. If there are details in an internal ticket, please include them on the public record if being used as a justification.

@TorreZuk TorreZuk self-assigned this Sep 12, 2025
@TorreZuk
Copy link
Contributor

It looks like this revert of the revert would again introduce serious performance drops. Naveen had the reproduced our internal QA team regression analysis before his revert in #966. Details are listed in internal regression ticket SWDEV-546097, but I have just reproduced a similar 12% drop on a larger gemm using MI250X using this PR change.
./rocblas-bench -f gemm_ex -r h -m 7744 -n 7744 -k 7744 --lda 7744 --ldb 7744 --ldc 7744 --ldd 7744 --compute_type s --transposeB T, see double precision similar drop, many GEMMs are listed.
You can read the ticket but for the community reader, performance drops were reported for MI200, MI300 and MI300X, many around 25% GFLOPs drop, on larger ~ 2k+ sized GEMMs. I will continue to review the tensile project history on Monday to try and analyze where things went wrong with this option. The original regression happened after 7.0 branch and was fixed before any point release.

@stellaraccident
Copy link
Contributor

It looks like this revert of the revert would again introduce serious performance drops. Naveen had the reproduced our internal QA team regression analysis before his revert in #966. Details are listed in internal regression ticket SWDEV-546097, but I have just reproduced a similar 12% drop on a larger gemm using MI250X using this PR change. ./rocblas-bench -f gemm_ex -r h -m 7744 -n 7744 -k 7744 --lda 7744 --ldb 7744 --ldc 7744 --ldd 7744 --compute_type s --transposeB T, see double precision similar drop, many GEMMs are listed. You can read the ticket but for the community reader, performance drops were reported for MI200, MI300 and MI300X, many around 25% GFLOPs drop, on larger ~ 2k+ sized GEMMs. I will continue to review the tensile project history on Monday to try and analyze where things went wrong with this option. The original regression happened after 7.0 branch and was fixed before any point release.

Thank you - we really need to root cause this situation. None of the devs can see a rational reason for such an action at a distance impact, and it could be a serious/nuanced issue.

@stellaraccident
Copy link
Contributor

Given that @TorreZuk has reproduced the performance drop, we need to hold and focus on root cause.

The reason I'm picking on this: we build on CI systems that don't have GPUs and there would seem to be no link possible between this flag and performance. If there is, that would be troubling indeed.

We have to root cause what the connection here is, not just for this to revert but to ensure that we aren't building software in an already compromised state.

Copy link
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a few changes I can avoid the regressions this causes but I am still trying to analyze the design flaws rather than just allowing this to proceed. Build and bench was the original design so that looks like it crept into places where it shouldn't have with a default ISA even outside benchmarking. Hopefully by tomorrow I can PR changes for review

Comment on lines +218 to +220
# We do not need to do device enumeration at library build time.
set(Options ${Options} "--no-enumerate")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow the other code pattern for options so probably better to wrap in a control option, e.g.
if (NOT Tensile_ENUMERATE).
"We do not..." is too ambiguous, state your use case which I presume is build only on possibly a CPU only node. This function may bey used by other community members with build and benchmark pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library build path does build on a CPU-only node without any AMD software installed (drivers, ROCm or otherwise). If there are other use cases hitting this path, then it needs better isolation. It would seem to not just be "community" paths, though, since we failed something in one of our own flows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes just want the comment in the code improved and cmake control var for older default build for benchmarking. Working on revisions that will allow this PR to merge #1636

@bstefanuk bstefanuk self-requested a review September 18, 2025 17:39
Copy link
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work is still underway in Tensile to unblock this but for now can't go in as is.
Probably will still want a cmake variable to control this option the same as all the others for backward compatibility. I can push this commit to this PR when it is unblocked.

@TorreZuk
Copy link
Contributor

TorreZuk commented Nov 6, 2025

This change to not enumerate was included with what @bstefanuk did in #2162 that is now merged so closing this PR.

@TorreZuk TorreZuk closed this Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants