Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1385,12 +1385,16 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
return;
}

// Look through ExtractLastLane (BuildVector ....).
if (match(Def, m_ExtractLastLane(m_BuildVector()))) {
auto *BuildVector = cast<VPInstruction>(Def->getOperand(0));
Def->replaceAllUsesWith(
BuildVector->getOperand(BuildVector->getNumOperands() - 1));
return;
// Look through ExtractLastLane.
if (match(Def, m_ExtractLastLane(m_VPValue(A)))) {
if (match(A, m_BuildVector())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such BuildVector operands were visited and processed earlier by the code below folding them into Broadcast if all their operands are equal. Does that folding hold for VF=1 too; if so should such redundant Broadcasts be eliminated too? Worth looking for extracts from broadcast too/instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like currently there are no extracts of broadcasts that could be folded in the tests. For VF=1, the broadcasts itself should be removed, possibly before hitting this pattern, but I might be missing something.

auto *BuildVector = cast<VPInstruction>(A);
Def->replaceAllUsesWith(
BuildVector->getOperand(BuildVector->getNumOperands() - 1));
return;
}
if (Plan->hasScalarVFOnly())
return Def->replaceAllUsesWith(A);
}

// Look through ExtractPenultimateElement (BuildVector ....).
Expand Down
12 changes: 7 additions & 5 deletions llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -371,10 +371,9 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {
continue;
}

if (match(&R, m_ExtractLastLaneOfLastPart(m_VPValue(Op0))) ||
match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
addUniformForAllParts(cast<VPSingleDefRecipe>(&R));
if (Plan.hasScalarVFOnly()) {
if (Plan.hasScalarVFOnly()) {
if (match(&R, m_ExtractLastPart(m_VPValue(Op0))) ||
match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
Comment on lines +375 to +376
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be simpler handle each match separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, the only difference is the index to extract from?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but folding that difference seems to complicate two trivial cases, the first which should hold for any VF, scalar or not:

    if (match(&R, m_ExtractLastPart(m_VPValue(Op0)))) {
      cast<VPInstruction>(&R)->replaceAllUsesWith(getValueForPart(Op0, UF - 1));
      continue;
    }

and the second which deals with scalar VF only:

    if (Plan.hasScalarVFOnly() && match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
      // For scalar VF, use the penultimate scalar part w/o extracting a lane.
      cast<VPInstruction>(&R)->replaceAllUsesWith(getValueForPart(Op0, UF - 2));
      continue;
    }

?

Are both missing addUniformForAllParts(cast<VPSingleDefRecipe>(&R));?

Can also hoist from below the auto *SingleDef = dyn_cast<VPSingleDefRecipe>(&R) to be used for the RAUW's here and addUniformForAllParts() elsewhere, instead of having to cast<>(&R) after various matches. Perhaps also handle the !SingleDef cases first, to freely access SingleDef afterwards.

auto *I = cast<VPInstruction>(&R);
bool IsPenultimatePart =
I->getOpcode() == VPInstruction::ExtractPenultimateElement;
Expand All @@ -383,7 +382,10 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {
I->replaceAllUsesWith(getValueForPart(Op0, PartIdx));
continue;
}
// For vector VF, always extract from the last part.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth retaining the comment, perhaps emphasizing that
// For vector VF, the penultimate element is always extracted from the last part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, re-added in e6e3f94

}
if (match(&R, m_ExtractLastLaneOfLastPart(m_VPValue(Op0))) ||
match(&R, m_ExtractPenultimateElement(m_VPValue(Op0)))) {
addUniformForAllParts(cast<VPSingleDefRecipe>(&R));
R.setOperand(0, getValueForPart(Op0, UF - 1));
continue;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,6 @@ exit:
; DBG-EMPTY:
; DBG-NEXT: middle.block:
; DBG-NEXT: EMIT vp<[[RESUME_1_PART:%.+]]> = extract-last-part vp<[[SCALAR_STEPS]]>
; DBG-NEXT: EMIT vp<[[RESUME_1:%.+]]> = extract-last-lane vp<[[RESUME_1_PART]]>
; DBG-NEXT: EMIT vp<[[CMP:%.+]]> = icmp eq vp<[[TC]]>, vp<[[VEC_TC]]>
; DBG-NEXT: EMIT branch-on-cond vp<[[CMP]]>
; DBG-NEXT: Successor(s): ir-bb<exit>, scalar.ph
Expand All @@ -230,7 +229,7 @@ exit:
; DBG-EMPTY:
; DBG-NEXT: scalar.ph:
; DBG-NEXT: EMIT-SCALAR vp<[[RESUME_IV:%.+]]> = phi [ vp<[[VTC]]>, middle.block ], [ ir<0>, ir-bb<entry> ]
; DBG-NEXT: EMIT-SCALAR vp<[[RESUME_P:%.*]]> = phi [ vp<[[RESUME_1]]>, middle.block ], [ ir<0>, ir-bb<entry> ]
; DBG-NEXT: EMIT-SCALAR vp<[[RESUME_P:%.*]]> = phi [ vp<[[RESUME_1_PART]]>, middle.block ], [ ir<0>, ir-bb<entry> ]
; DBG-NEXT: Successor(s): ir-bb<loop>
; DBG-EMPTY:
; DBG-NEXT: ir-bb<loop>:
Expand Down
Loading