Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeExtractor] Allow to use 0 addr space for aggregate arg #66998

Merged

Conversation

DominikAdamski
Copy link
Contributor

@DominikAdamski DominikAdamski commented Sep 21, 2023

The user of CodeExtractor should be able to specify that
the aggregate argument should be passed as a pointer in zero address
space.

CodeExtractor is used to generate outlined functions required by OpenMP
runtime. The arguments of the outlined functions for OpenMP GPU code
are in 0 address space. 0 address space does not need to be the default
address space for GPU device. That's why there is a need to allow
the user of CodeExtractor to specify, that the allocated aggregate param
is passed as pointer in zero address space.

The proposed changes are used in PR: #67000

@llvmbot
Copy link

llvmbot commented Sep 21, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-openmp

Changes

The user of CodeExtractor should be able to specify the alloca address space for aggregate arguments object.

The information about alloca address space for aggregate arguments object is used to define type of outlined function arguments.

CodeExtractor is used to generate outlined functions required by OpenMP runtime. The arguments of the outlined functions for OpenMP GPU code are in 0 address space. 0 address space does not need to be the default address space for GPU device. That's why there is a need to allow user of CodeExtractor to specify the address space for aggregate args structure.


Full diff: https://github.com/llvm/llvm-project/pull/66998.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Utils/CodeExtractor.h (+10-1)
  • (modified) llvm/lib/Transforms/Utils/CodeExtractor.cpp (+13-8)
diff --git a/llvm/include/llvm/Transforms/Utils/CodeExtractor.h b/llvm/include/llvm/Transforms/Utils/CodeExtractor.h
index bb23cf4a9a3cbbb..c610a4c0bc93749 100644
--- a/llvm/include/llvm/Transforms/Utils/CodeExtractor.h
+++ b/llvm/include/llvm/Transforms/Utils/CodeExtractor.h
@@ -114,6 +114,11 @@ class CodeExtractorAnalysisCache {
     // label, if non-empty, otherwise "extracted".
     std::string Suffix;
 
+    // Optional value of alloca address space which should be used for
+    // allocation of structure which aggregates arguments. If empty, default
+    // alloca address space for given data layout is used.
+    std::optional<unsigned> AggregateArgsAllocaAddrSpace;
+
   public:
     /// Create a code extractor for a sequence of blocks.
     ///
@@ -128,13 +133,17 @@ class CodeExtractorAnalysisCache {
     /// Any new allocations will be placed in the AllocationBlock, unless
     /// it is null, in which case it will be placed in the entry block of
     /// the function from which the code is being extracted.
+    /// AggregateArgsAllocaAddrSpace param specifies the alloca address space
+    /// where aggregate argument structure is allocated. If it's not specified,
+    /// then the default alloca address space is used.
     CodeExtractor(ArrayRef<BasicBlock *> BBs, DominatorTree *DT = nullptr,
                   bool AggregateArgs = false, BlockFrequencyInfo *BFI = nullptr,
                   BranchProbabilityInfo *BPI = nullptr,
                   AssumptionCache *AC = nullptr, bool AllowVarArgs = false,
                   bool AllowAlloca = false,
                   BasicBlock *AllocationBlock = nullptr,
-                  std::string Suffix = "");
+                  std::string Suffix = "",
+                  std::optional<unsigned> AggregateArgsAllocaAddrSpace = {});
 
     /// Create a code extractor for a loop body.
     ///
diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
index cafa99491f5b5f6..e03f31d16526910 100644
--- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp
+++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp
@@ -241,16 +241,17 @@ buildExtractionBlockSet(ArrayRef<BasicBlock *> BBs, DominatorTree *DT,
   return Result;
 }
 
-CodeExtractor::CodeExtractor(ArrayRef<BasicBlock *> BBs, DominatorTree *DT,
-                             bool AggregateArgs, BlockFrequencyInfo *BFI,
-                             BranchProbabilityInfo *BPI, AssumptionCache *AC,
-                             bool AllowVarArgs, bool AllowAlloca,
-                             BasicBlock *AllocationBlock, std::string Suffix)
+CodeExtractor::CodeExtractor(
+    ArrayRef<BasicBlock *> BBs, DominatorTree *DT, bool AggregateArgs,
+    BlockFrequencyInfo *BFI, BranchProbabilityInfo *BPI, AssumptionCache *AC,
+    bool AllowVarArgs, bool AllowAlloca, BasicBlock *AllocationBlock,
+    std::string Suffix, std::optional<unsigned> AggregateArgsAllocaAddrSpace)
     : DT(DT), AggregateArgs(AggregateArgs || AggregateArgsOpt), BFI(BFI),
       BPI(BPI), AC(AC), AllocationBlock(AllocationBlock),
       AllowVarArgs(AllowVarArgs),
       Blocks(buildExtractionBlockSet(BBs, DT, AllowVarArgs, AllowAlloca)),
-      Suffix(Suffix) {}
+      Suffix(Suffix),
+      AggregateArgsAllocaAddrSpace(AggregateArgsAllocaAddrSpace) {}
 
 CodeExtractor::CodeExtractor(DominatorTree &DT, Loop &L, bool AggregateArgs,
                              BlockFrequencyInfo *BFI,
@@ -866,7 +867,9 @@ Function *CodeExtractor::constructFunction(const ValueSet &inputs,
   StructType *StructTy = nullptr;
   if (AggregateArgs && !AggParamTy.empty()) {
     StructTy = StructType::get(M->getContext(), AggParamTy);
-    ParamTy.push_back(PointerType::get(StructTy, DL.getAllocaAddrSpace()));
+    ParamTy.push_back(PointerType::get(
+        StructTy,
+        AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace())));
   }
 
   LLVM_DEBUG({
@@ -1183,7 +1186,9 @@ CallInst *CodeExtractor::emitCallAndSwitchStatement(Function *newFunction,
     // Allocate a struct at the beginning of this function
     StructArgTy = StructType::get(newFunction->getContext(), ArgTypes);
     Struct = new AllocaInst(
-        StructArgTy, DL.getAllocaAddrSpace(), nullptr, "structArg",
+        StructArgTy,
+        AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace()), nullptr,
+        "structArg",
         AllocationBlock ? &*AllocationBlock->getFirstInsertionPt()
                         : &codeReplacer->getParent()->front().front());
     params.push_back(Struct);

@jdoerfert
Copy link
Member

Why can’t we use the DL AllocaAS always?

@DominikAdamski
Copy link
Contributor Author

Why can’t we use the DL AllocaAS always?

Without this patch the outlined function is as follows:
define internal void @__omp_offloading_fd00_2b03043__QQmain_l30..omp_par(ptr noalias noundef %tid.addr, ptr noalias noundef %zero.addr, ptr addrspace(5) %0)
Whereas Clang generates the outlined function in the following way:
define internal void @__omp_offloading_fd00_2b03043__QQmain_l30..omp_par(ptr noalias noundef %tid.addr, ptr noalias noundef %zero.addr, ptr %0)

@@ -114,6 +114,11 @@ class CodeExtractorAnalysisCache {
// label, if non-empty, otherwise "extracted".
std::string Suffix;

// Optional value of alloca address space which should be used for
// allocation of structure which aggregates arguments. If empty, default
// alloca address space for given data layout is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing. This is not used for the allocation, just for the pointer argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Extractor allocates struct for aggregate arguments. The alloca instruction is created in line 1186:

 Struct = new AllocaInst(
        StructArgTy, DL.getAllocaAddrSpace(), nullptr, "structArg",
        StructArgTy,
        /* My change - specify address space */
        AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace()), nullptr,
        "structArg",

The type of alloca is used to determine the type of pointer argument of the outlined function. Changes in CodeExtractor line 870:

   ParamTy.push_back(PointerType::get(
       StructTy,
       AggregateArgsAllocaAddrSpace.value_or(DL.getAllocaAddrSpace())));
 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent changes removed alloca in 0 address space. There is address space cast instruction to ensure, that the aggregate struct is passed as 0 address space pointer.

@DominikAdamski DominikAdamski force-pushed the code_extractor_alloca_address_space branch from 8756f39 to 702f7ce Compare September 28, 2023 08:10
@DominikAdamski DominikAdamski changed the title [CodeExtractor][OpenMP] Allow to specify address space for aggregate arguments structure [CodeExtractor] Allow to use 0 addr space for aggregate arg Sep 28, 2023
@DominikAdamski DominikAdamski force-pushed the code_extractor_alloca_address_space branch from 702f7ce to c24e319 Compare September 28, 2023 09:29
Copy link
Member

@jdoerfert jdoerfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, but only commit it with a test, so the follow up PR. Or add a standalone test.


if (ArgsInZeroAddressSpace && DL.getAllocaAddrSpace() != 0) {
AddrSpaceCastInst *StructSpaceCast = new AddrSpaceCastInst(
Struct, Struct->getType()->getPointerTo(), "structArg.ascast");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto, and just PointerTy, let's embrace opaque ptrs.

The user of CodeExtractor should be able to specify that
the aggregate argument should be passed as a pointer in zero address
space.

CodeExtractor is used to generate outlined functions required by OpenMP
runtime. The arguments of the outlined functions for OpenMP GPU code
are in 0 address space. 0 address space does not need to be the default
address space for GPU device. That's why there is a need to allow
the user of CodeExtractor to specify, that the allocated aggregate param
is passed as pointer in zero address space.
@DominikAdamski DominikAdamski force-pushed the code_extractor_alloca_address_space branch from c24e319 to 5e8c9ee Compare October 6, 2023 10:42
Copy link
Member

@jdoerfert jdoerfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants