GlobalISel: Fix mishandling vector-as-scalar in return values #175780

arsenm · 2026-01-13T15:29:30Z

This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.

Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.

Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.

All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.

arsenm · 2026-01-13T15:29:48Z

llvmbot · 2026-01-13T15:30:06Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Matt Arsenault (arsenm)

Changes

This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.

Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.

Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.

All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.

Full diff: https://github.com/llvm/llvm-project/pull/175780.diff

1 Files Affected:

(modified) llvm/lib/CodeGen/GlobalISel/CallLowering.cpp (+24-2)

diff --git a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
index e2ed45eec0ecd..0da360d8038b6 100644
--- a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
@@ -568,6 +568,13 @@ static void buildCopyToRegs(MachineIRBuilder &B, ArrayRef<Register> DstRegs,
 
   const TypeSize PartSize = PartTy.getSizeInBits();
 
+  if (PartSize == SrcTy.getSizeInBits() && DstRegs.size() == 1) {
+    // TODO: Handle int<->ptr casts. It just happens the ABI lowering
+    // assignments are not pointer aware.
+    B.buildBitcast(DstRegs[0], SrcReg);
+    return;
+  }
+
   if (PartTy.isVector() == SrcTy.isVector() &&
       PartTy.getScalarSizeInBits() > SrcTy.getScalarSizeInBits()) {
     assert(DstRegs.size() == 1);
@@ -576,9 +583,11 @@ static void buildCopyToRegs(MachineIRBuilder &B, ArrayRef<Register> DstRegs,
   }
 
   if (SrcTy.isVector() && !PartTy.isVector() &&
-      TypeSize::isKnownGT(PartSize, SrcTy.getElementType().getSizeInBits())) {
+      TypeSize::isKnownGT(PartSize, SrcTy.getElementType().getSizeInBits()) &&
+      SrcTy.getElementCount() == ElementCount::getFixed(DstRegs.size())) {
     // Vector was scalarized, and the elements extended.
     auto UnmergeToEltTy = B.buildUnmerge(SrcTy.getElementType(), SrcReg);
+
     for (int i = 0, e = DstRegs.size(); i != e; ++i)
       B.buildAnyExt(DstRegs[i], UnmergeToEltTy.getReg(i));
     return;
@@ -645,9 +654,22 @@ static void buildCopyToRegs(MachineIRBuilder &B, ArrayRef<Register> DstRegs,
     }
   }
 
-  if (LCMTy.isVector() && CoveringSize != SrcSize)
+  if (LCMTy.isVector() && CoveringSize != SrcSize) {
     UnmergeSrc = B.buildPadVectorWithUndefElements(LCMTy, SrcReg).getReg(0);
 
+    unsigned ExcessBits = CoveringSize - DstSize * DstRegs.size();
+    if (ExcessBits != 0) {
+      SmallVector<Register, 8> PaddedDstRegs(DstRegs.begin(), DstRegs.end());
+
+      MachineRegisterInfo &MRI = *B.getMRI();
+      for (unsigned I = 0; I != ExcessBits; I += PartSize)
+        PaddedDstRegs.push_back(MRI.createGenericVirtualRegister(PartTy));
+
+      B.buildUnmerge(PaddedDstRegs, UnmergeSrc);
+      return;
+    }
+  }
+
   B.buildUnmerge(DstRegs, UnmergeSrc);
 }

aemerson · 2026-01-16T22:36:06Z

Do we have a test for this or will it be exercised later?

arsenm · 2026-01-17T07:17:50Z

Do we have a test for this or will it be exercised later?

Later, this combination isn't used until #175781

arsenm · 2026-01-22T10:01:37Z

ping

petar-avramovic · 2026-01-22T11:28:06Z

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

+  if (LCMTy.isVector() && CoveringSize != SrcSize) {
    UnmergeSrc = B.buildPadVectorWithUndefElements(LCMTy, SrcReg).getReg(0);

+    unsigned ExcessBits = CoveringSize - DstSize * DstRegs.size();
+    if (ExcessBits != 0) {
+      SmallVector<Register, 8> PaddedDstRegs(DstRegs.begin(), DstRegs.end());
+
+      MachineRegisterInfo &MRI = *B.getMRI();
+      for (unsigned I = 0; I != ExcessBits; I += PartSize)
+        PaddedDstRegs.push_back(MRI.createGenericVirtualRegister(PartTy));
+
+      B.buildUnmerge(PaddedDstRegs, UnmergeSrc);
+      return;
+    }
+  }


%23:_(s16), %24:_(s16), %25:_(s16) = G_UNMERGE_VALUES %20(<3 x s16>) %26:_(s16) = G_IMPLICIT_DEF %27:_(<6 x s16>) = G_BUILD_VECTOR %23(s16), %24(s16), %25(s16), %26(s16), %26(s16), %26(s16) %21:_(s32), %22:_(s32), %28:_(s32) = G_UNMERGE_VALUES %27(<6 x s16>) $vgpr0 = COPY %21(s32) $vgpr1 = COPY %22(s32)

Not sure how this part is meant to work but padding DstRegs looks wrong
it is asked to return
<3 x s16>
in 2 s32 (DstRegs)
LCMTy is making larger type then needed it seems
and we return in more registers then target asked
SmallVector<Register, 8> PaddedDstRegs(DstRegs.begin(), DstRegs.end());

This might be getting into target specific territory
Instead of LCMTy
need to calculate with how many elements to pad SrcTy with to have its size equal to (DstRegs.size() * PartTy.getSizeInBits())

For example when calculating LCMTy like this there is no need to pad DstRegs

LLT LCMTy = getCoverTy(SrcTy, PartTy); if (SrcTy.isVector() && DstRegs.size() > 1) { unsigned CoverSize = DstRegs.size() * DstTy.getSizeInBits(); LLT EltTy = SrcTy.getElementType(); unsigned EltSize = EltTy.getSizeInBits(); if (CoverSize % EltSize == 0) LCMTy = LLT::fixed_vector(CoverSize / EltSize, EltTy); }

This returns exactly as many registers as the target asked, the padding is just to conform to the requirements of the unmerge.

e.g. for this:

define <3 x i16> @ret_v3i16() { ret <3 x i16> poison }

We get:

%8:_(<3 x s16>) = G_IMPLICIT_DEF %11:_(s16), %12:_(s16), %13:_(s16) = G_UNMERGE_VALUES %8(<3 x s16>) %14:_(s16) = G_IMPLICIT_DEF %15:_(<6 x s16>) = G_BUILD_VECTOR %11(s16), %12(s16), %13(s16), %14(s16), %14(s16), %14(s16) %9:_(s32), %10:_(s32), %16:_(s32) = G_UNMERGE_VALUES %15(<6 x s16>) $vgpr0 = COPY %9(s32) $vgpr1 = COPY %10(s32) SI_RETURN implicit $vgpr0, implicit $vgpr1

The padding is the unused %16

What are the requirements of the unmerge?
If LCMTy is calculated in different way (see comment above) there is no need for if (ExcessBits != 0) ...

The result pieces need to cover the input part piece, which is the widened LCMTy, so you need the extra unused result register

If LCMTy is calculated in a better way (get V4S16 instead of V6S16), can get:

%8:_(<3 x s16>) = G_IMPLICIT_DEF %11:_(s16), %12:_(s16), %13:_(s16) = G_UNMERGE_VALUES %8(<3 x s16>) %14:_(s16) = G_IMPLICIT_DEF %15:_(<4 x s16>) = G_BUILD_VECTOR %11(s16), %12(s16), %13(s16), %14(s16) %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %15(<4 x s16>) $vgpr0 = COPY %9(s32) $vgpr1 = COPY %10(s32) SI_RETURN implicit $vgpr0, implicit $vgpr1

which does not require padding of DstRegs, can just unmerge

OK, I have your suggestion working. It happens to give worse codegen in 2 tests, but that's a downstream issue

Will look into regressions, it is something in legalizer/artifact combiner

This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16> values as packed on gfx6/gfx7. The ABI does not pack values currently; this is a pre-fix for that change. Insert a bitcast if there is a single part with a different size. Previously this would miscompile by going through the scalarization and extend path, dropping the high element. Also fix assertions in odd cases, like <3 x i16> -> i32. This needs to unmerge with excess elements from the widened source vector. All of this code is in need of a cleanup; this should look more like the DAG version using getVectorTypeBreakdown.

petar-avramovic

LGTM, thanks

…75780) This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16> values as packed on gfx6/gfx7. The ABI does not pack values currently; this is a pre-fix for that change. Insert a bitcast if there is a single part with a different size. Previously this would miscompile by going through the scalarization and extend path, dropping the high element. Also fix assertions in odd cases, like <3 x i16> -> i32. This needs to unmerge with excess elements from the widened source vector. All of this code is in need of a cleanup; this should look more like the DAG version using getVectorTypeBreakdown.

arsenm added llvm:globalisel backend:AMDGPU labels Jan 13, 2026 — with Graphite App

arsenm mentioned this pull request Jan 13, 2026

AMDGPU: Directly use v2bf16 as register type for bf16 vectors. #175761

Merged

arsenm requested review from Pierre-vh, aemerson, davemgreen, michaelmaitland and petar-avramovic January 13, 2026 15:30

arsenm marked this pull request as ready for review January 13, 2026 15:30

arsenm mentioned this pull request Jan 13, 2026

AMDGPU: Change ABI of 16-bit element vectors on gfx6/7 #175781

Merged

Base automatically changed from users/arsenm/amdgpu/fix-bf16-register-type-for-calling-conv to main January 13, 2026 16:48

arsenm force-pushed the users/arsenm/globalisel/fix-mishandling-bitcast-vector-to-scalar-return branch from 186d520 to aa92839 Compare January 13, 2026 16:51

arsenm mentioned this pull request Jan 13, 2026

AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7 #175795

Merged

arsenm force-pushed the users/arsenm/globalisel/fix-mishandling-bitcast-vector-to-scalar-return branch from aa92839 to 29cc6a8 Compare January 13, 2026 20:09

petar-avramovic reviewed Jan 22, 2026

View reviewed changes

arsenm added 4 commits January 22, 2026 15:17

suggestion

a6418f0

Suggestion with TypeSize

0fec035

suggestion cleanup

85d881b

arsenm force-pushed the users/arsenm/globalisel/fix-mishandling-bitcast-vector-to-scalar-return branch from 29cc6a8 to 85d881b Compare January 22, 2026 14:23

petar-avramovic approved these changes Jan 22, 2026

View reviewed changes

arsenm merged commit 4a3f33d into main Jan 22, 2026
8 of 11 checks passed

arsenm deleted the users/arsenm/globalisel/fix-mishandling-bitcast-vector-to-scalar-return branch January 22, 2026 16:24

This was referenced Jan 22, 2026

AMDGPU: Mark strict_fp16_to_fp as expand #177417

Merged

AMDGPU: Avoid introducing illegal fminnum_ieee/fmaxnum_ieee #177418

Merged

This was referenced Jan 22, 2026

R600: Remove softPromoteHalfType #177420

Merged

AMDGPU: Move softPromoteHalfType override to R600 only #177419

Merged

DAG: Remove TypePromoteFloat #177427

Merged

GlobalISel: Fix mishandling vector-as-scalar in return values #175780

GlobalISel: Fix mishandling vector-as-scalar in return values #175780

Conversation

arsenm commented Jan 13, 2026

Uh oh!

arsenm commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aemerson commented Jan 16, 2026

Uh oh!

arsenm commented Jan 17, 2026

Uh oh!

arsenm commented Jan 22, 2026

Uh oh!

petar-avramovic Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petar-avramovic Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

arsenm Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

petar-avramovic Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

arsenm Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

petar-avramovic Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

arsenm Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

petar-avramovic Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

petar-avramovic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arsenm commented Jan 13, 2026 •

edited

Loading

llvmbot commented Jan 13, 2026 •

edited

Loading

petar-avramovic Jan 22, 2026 •

edited

Loading