Skip to content

Cherry-pick 2 amd-llvm reverts for performance regressions#4013

Merged
ronlieb merged 1 commit into
mainfrom
amd/dev/rlieberm/SMPbumpWW06-7.13-2prs
Mar 18, 2026
Merged

Cherry-pick 2 amd-llvm reverts for performance regressions#4013
ronlieb merged 1 commit into
mainfrom
amd/dev/rlieberm/SMPbumpWW06-7.13-2prs

Conversation

@ronlieb
Copy link
Copy Markdown
Contributor

@ronlieb ronlieb commented Mar 17, 2026

Compiler Submodule Update

Cherry-picks 2 revert commits from amd-llvm to address performance regressions.

Submodule Changes

Submodule Old New
amd-llvm 376decc81 910937524

Cherry-picked Commits

Commit Description
28700f4d2228 Revert "AMDGPU: Fix runtime unrolling when cascaded GEPs present" - llama.cpp regression
910937524277 Revert SLP vectorizer external uses estimation - tree throttling issue

@ronlieb
Copy link
Copy Markdown
Contributor Author

ronlieb commented Mar 18, 2026

please review, we wish to land this sometime wednesday.

@lamb-j
Copy link
Copy Markdown
Contributor

lamb-j commented Mar 18, 2026

@ronlieb, @searlmc1, @kzhuravl, think adding something like this to the gh PR description for our submodule updates would be useful, or just noise? I know this one is out of date after we rebase this PR (or do we still even need this PR after #3834?), but just using it as an example


Branch base date: 2026-02-17

Submodule updates

Component From To
amd-llvm c849bc16b0 91093752427 (amd-compiler-2026-06)
hipify 05290949a8 86c76dc618
spirv-llvm-translator 3bceafa607 d575617fd4

llvm-project changes (5708 commits)

AMDGPU-specific:

  • Add gfx12-5-generic subtarget support
  • Add gfx1250 revision kernel note and B0-specific option
  • GFX1250 A0-specific patches
  • Asynchronous loads from global/buffer to LDS on pre-GFX12
  • Introduce asyncmark/wait intrinsics
  • GlobalISel RegBankLegalize rules for buffer load LDS and atomics

Compiler infrastructure:

  • Multiple merges from LLVM main into amd-staging
  • SLP vectorizer fixes and improvements
  • DAGCombiner crash fix
  • Loop vectorization: early exit loops with multiple exits
  • New llvm.looptrap intrinsic

hipify changes (1 commit)

  • Remove unlicensed files

spirv-llvm-translator changes (24 commits)

  • Upstream merges and LLVM API updates
  • Bug fixes: FMA, MergeBlock, LoopMerge, vload/vstore

Cherry-picked commits (after branch base date)

Commit Date Description
910937524277 2026-03-17 revert [SLP] external uses estimations
9868d54e96fb 2026-03-08 regen lit test for cluster.load.async.to.lds
9760a5ffbf77 2026-03-05 [AMDGPU] Add .gfx1250_revision kernel note
080153b39b5b 2026-03-04 [AMDGPU] Add -amdgpu-gfx1250-b0-specific option
fcc41f00ef54 2026-02-27 [SLP] Reject duplicate shift amounts
28700f4d2228 2026-02-26 Revert AMDGPU runtime unrolling fix
739776a9841d 2026-02-25 [AMDGPU] Add gfx12-5-generic subtarget
65b91fabafc5 2026-02-19 [GFX1250] A0-specific patches
376decc81273 2026-02-17 Revert [IndVarsSimplify] sinkUnusedInvariants

Patches removed (now upstreamed)

  • 0001-Ensure-to-use-libamdhip64-with-major-version.patch
  • 0009-Add-gcc-toolset-13-prefix-detection.patch

Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a merge conflict that needs to be resolved first.

@ronlieb
Copy link
Copy Markdown
Contributor Author

ronlieb commented Mar 18, 2026

this looks awesome, thanks. hoping we land this one today

@marbre
Copy link
Copy Markdown
Member

marbre commented Mar 18, 2026

this looks awesome, thanks. hoping we land this one today

We can't without resolving the conflict first :(

@ronlieb
Copy link
Copy Markdown
Contributor Author

ronlieb commented Mar 18, 2026

this looks awesome, thanks. hoping we land this one today

We can't without resolving the conflict first :(

Ack

@lamb-j lamb-j force-pushed the amd/dev/rlieberm/SMPbumpWW06-7.13-2prs branch from 6a154ff to 4f92953 Compare March 18, 2026 15:27
amd-llvm: 376decc81 -> 910937524

Cherry-picked commits:

1. Revert AMDGPU runtime unrolling fix for cascaded GEPs (#183641)
   Addresses llama.cpp performance regression

2. Revert SLP vectorizer external uses estimation (fc648683cd75)
   Reverts tree throttling estimation changes

Co-Authored-By: Claude <noreply@anthropic.com>
@lamb-j lamb-j force-pushed the amd/dev/rlieberm/SMPbumpWW06-7.13-2prs branch from 4f92953 to ac68de6 Compare March 18, 2026 15:34
@lamb-j lamb-j changed the title update submodule pointer for amd-llvm:amd-compiler-2026-06 91093752427 Cherry-pick 2 amd-llvm reverts for performance regressions Mar 18, 2026
@ronlieb ronlieb merged commit 376a534 into main Mar 18, 2026
173 of 181 checks passed
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Mar 18, 2026
@ronlieb ronlieb deleted the amd/dev/rlieberm/SMPbumpWW06-7.13-2prs branch March 18, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants