-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeExtractor] Allow to use 0 addr space for aggregate arg #66998
[CodeExtractor] Allow to use 0 addr space for aggregate arg #66998
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-openmp ChangesThe user of CodeExtractor should be able to specify the alloca address space for aggregate arguments object. The information about alloca address space for aggregate arguments object is used to define type of outlined function arguments. CodeExtractor is used to generate outlined functions required by OpenMP runtime. The arguments of the outlined functions for OpenMP GPU code are in 0 address space. 0 address space does not need to be the default address space for GPU device. That's why there is a need to allow user of CodeExtractor to specify the address space for aggregate args structure. Full diff: https://github.com/llvm/llvm-project/pull/66998.diff 2 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/CodeExtractor.h b/llvm/include/llvm/Transforms/Utils/CodeExtractor.h
index bb23cf4a9a3cbbb..c610a4c0bc93749 100644
--- a/llvm/include/llvm/Transforms/Utils/CodeExtractor.h
+++ b/llvm/include/llvm/Transforms/Utils/CodeExtractor.h
@@ -114,6 +114,11 @@ class CodeExtractorAnalysisCache {
// label, if non-empty, otherwise "extracted".
std::string Suffix;
+ // Optional value of alloca address space which should be used for
+ // allocation of structure which aggregates arguments. If empty, default
+ // alloca address space for given data layout is used.
+ std::optional<unsigned> AggregateArgsAllocaAddrSpace;
+
public:
/// Create a code extractor for a sequence of blocks.
///
@@ -128,13 +133,17 @@ class CodeExtractorAnalysisCache {
/// Any new allocations will be placed in the AllocationBlock, unless
/// it is null, in which case it will be placed in the entry block of
/// the function from which the code is being extracted.
+ /// AggregateArgsAllocaAddrSpace param specifies the alloca address space
+ /// where aggregate argument structure is allocated. If it's not specified,
+ /// then the default alloca address space is used.
CodeExtractor(ArrayRef<BasicBlock *> BBs, DominatorTree *DT = nullptr,
bool AggregateArgs = false, BlockFrequencyInfo *BFI = nullptr,
BranchProbabilityInfo *BPI = nullptr,
AssumptionCache *AC = nullptr, bool AllowVarArgs = false,
bool AllowAlloca = false,
BasicBlock *AllocationBlock = nullptr,
- std::string Suffix = "");
+ std::string Suffix = "",
+ std::optional<unsigned> AggregateArgsAllocaAddrSpace = {});
/// Create a code extractor for a loop body.
///
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
index cafa99491f5b5f6..e03f31d16526910 100644
--- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp
+++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
@@ -241,16 +241,17 @@ buildExtractionBlockSet(ArrayRef<BasicBlock *> BBs, DominatorTree *DT,
return Result;
}
-CodeExtractor::CodeExtractor(ArrayRef<BasicBlock *> BBs, DominatorTree *DT,
- bool AggregateArgs, BlockFrequencyInfo *BFI,
- BranchProbabilityInfo *BPI, AssumptionCache *AC,
- bool AllowVarArgs, bool AllowAlloca,
- BasicBlock *AllocationBlock, std::string Suffix)
+CodeExtractor::CodeExtractor(
+ ArrayRef<BasicBlock *> BBs, DominatorTree *DT, bool AggregateArgs,
+ BlockFrequencyInfo *BFI, BranchProbabilityInfo *BPI, AssumptionCache *AC,
+ bool AllowVarArgs, bool AllowAlloca, BasicBlock *AllocationBlock,
+ std::string Suffix, std::optional<unsigned> AggregateArgsAllocaAddrSpace)
: DT(DT), AggregateArgs(AggregateArgs || AggregateArgsOpt), BFI(BFI),
BPI(BPI), AC(AC), AllocationBlock(AllocationBlock),
AllowVarArgs(AllowVarArgs),
Blocks(buildExtractionBlockSet(BBs, DT, AllowVarArgs, AllowAlloca)),
- Suffix(Suffix) {}
+ Suffix(Suffix),
+ AggregateArgsAllocaAddrSpace(AggregateArgsAllocaAddrSpace) {}
CodeExtractor::CodeExtractor(DominatorTree &DT, Loop &L, bool AggregateArgs,
BlockFrequencyInfo *BFI,
@@ -866,7 +867,9 @@ Function *CodeExtractor::constructFunction(const ValueSet &inputs,
StructType *StructTy = nullptr;
if (AggregateArgs && !AggParamTy.empty()) {
StructTy = StructType::get(M->getContext(), AggParamTy);
- ParamTy.push_back(PointerType::get(StructTy, DL.getAllocaAddrSpace()));
+ ParamTy.push_back(PointerType::get(
+ StructTy,
+ AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace())));
}
LLVM_DEBUG({
@@ -1183,7 +1186,9 @@ CallInst *CodeExtractor::emitCallAndSwitchStatement(Function *newFunction,
// Allocate a struct at the beginning of this function
StructArgTy = StructType::get(newFunction->getContext(), ArgTypes);
Struct = new AllocaInst(
- StructArgTy, DL.getAllocaAddrSpace(), nullptr, "structArg",
+ StructArgTy,
+ AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace()), nullptr,
+ "structArg",
AllocationBlock ? &*AllocationBlock->getFirstInsertionPt()
: &codeReplacer->getParent()->front().front());
params.push_back(Struct);
|
Why can’t we use the DL AllocaAS always? |
Without this patch the outlined function is as follows: |
@@ -114,6 +114,11 @@ class CodeExtractorAnalysisCache { | |||
// label, if non-empty, otherwise "extracted". | |||
std::string Suffix; | |||
|
|||
// Optional value of alloca address space which should be used for | |||
// allocation of structure which aggregates arguments. If empty, default | |||
// alloca address space for given data layout is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is confusing. This is not used for the allocation, just for the pointer argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Extractor allocates struct for aggregate arguments. The alloca instruction is created in line 1186:
Struct = new AllocaInst(
StructArgTy, DL.getAllocaAddrSpace(), nullptr, "structArg",
StructArgTy,
/* My change - specify address space */
AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace()), nullptr,
"structArg",
The type of alloca is used to determine the type of pointer argument of the outlined function. Changes in CodeExtractor line 870:
ParamTy.push_back(PointerType::get(
StructTy,
AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace())));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent changes removed alloca in 0 address space. There is address space cast instruction to ensure, that the aggregate struct is passed as 0 address space pointer.
8756f39
to
702f7ce
Compare
702f7ce
to
c24e319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, but only commit it with a test, so the follow up PR. Or add a standalone test.
|
||
if (ArgsInZeroAddressSpace && DL.getAllocaAddrSpace() != 0) { | ||
AddrSpaceCastInst *StructSpaceCast = new AddrSpaceCastInst( | ||
Struct, Struct->getType()->getPointerTo(), "structArg.ascast"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto, and just PointerTy, let's embrace opaque ptrs.
The user of CodeExtractor should be able to specify that the aggregate argument should be passed as a pointer in zero address space. CodeExtractor is used to generate outlined functions required by OpenMP runtime. The arguments of the outlined functions for OpenMP GPU code are in 0 address space. 0 address space does not need to be the default address space for GPU device. That's why there is a need to allow the user of CodeExtractor to specify, that the allocated aggregate param is passed as pointer in zero address space.
c24e319
to
5e8c9ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
The user of CodeExtractor should be able to specify that
the aggregate argument should be passed as a pointer in zero address
space.
CodeExtractor is used to generate outlined functions required by OpenMP
runtime. The arguments of the outlined functions for OpenMP GPU code
are in 0 address space. 0 address space does not need to be the default
address space for GPU device. That's why there is a need to allow
the user of CodeExtractor to specify, that the allocated aggregate param
is passed as pointer in zero address space.
The proposed changes are used in PR: #67000