[flang][fir] Add fir.local op for locality specifiers#138505
Conversation
|
@llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesAdds a new Full diff: https://github.com/llvm/llvm-project/pull/138505.diff 4 Files Affected:
diff --git a/flang/include/flang/Optimizer/Dialect/FIRAttr.td b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
index 3ebc24951cfff..2845080030b92 100644
--- a/flang/include/flang/Optimizer/Dialect/FIRAttr.td
+++ b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
@@ -200,4 +200,23 @@ def fir_OpenMPSafeTempArrayCopyAttr : fir_Attr<"OpenMPSafeTempArrayCopy"> {
}];
}
+def LocalitySpecTypeLocal : I32EnumAttrCase<"Local", 0, "local">;
+def LocalitySpecTypeLocalInit
+ : I32EnumAttrCase<"LocalInit", 1, "local_init">;
+
+def LocalitySpecifierType : I32EnumAttr<
+ "LocalitySpecifierType",
+ "Type of a locality specifier", [
+ LocalitySpecTypeLocal,
+ LocalitySpecTypeLocalInit
+ ]> {
+ let genSpecializedAttr = 0;
+ let cppNamespace = "::fir";
+}
+
+def LocalitySpecifierTypeAttr : EnumAttr<FIROpsDialect, LocalitySpecifierType,
+ "locality_specifier_type"> {
+ let assemblyFormat = "`{` `type` `=` $value `}`";
+}
+
#endif // FIR_DIALECT_FIR_ATTRS
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td
index 0ba985641069b..aea57d2e8dd71 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3485,6 +3485,137 @@ def fir_BoxTotalElementsOp
let hasCanonicalizer = 1;
}
+def YieldOp : fir_Op<"yield",
+ [Pure, ReturnLike, Terminator,
+ ParentOneOf<["LocalitySpecifierOp"]>]> {
+ let summary = "loop yield and termination operation";
+ let description = [{
+ "fir.yield" yields SSA values from the fir dialect op region and
+ terminates the region. The semantics of how the values are yielded is
+ defined by the parent operation.
+ }];
+
+ let arguments = (ins Variadic<AnyType>:$results);
+
+ let builders = [
+ OpBuilder<(ins), [{ build($_builder, $_state, {}); }]>
+ ];
+
+ let assemblyFormat = "( `(` $results^ `:` type($results) `)` )? attr-dict";
+}
+
+def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> {
+ let summary = "Provides declaration of [first]private logic.";
+ let description = [{
+ This operation provides a declaration of how to implement the
+ localization of a variable. The dialect users should provide
+ which type should be allocated for this variable. The allocated (usually by
+ alloca) variable is passed to the initialization region which does everything
+ else (e.g. initialization of Fortran runtime descriptors). Information about
+ how to initialize the copy from the original item should be given in the
+ copy region, and if needed, how to deallocate memory (allocated by the
+ initialization region) in the dealloc region.
+
+ Examples:
+
+ * `local(x)` would not need any regions because no initialization is
+ required by the standard for i32 variables and this is not firstprivate.
+ ```mlir
+ fir.local {type = local} @x.localizer : i32
+ ```
+
+ * `local_init(x)` would be emitted as:
+ ```mlir
+ fir.local {type = local_init} @x.localizer : i32 copy {
+ ^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
+ // %arg0 is the original host variable.
+ // %arg1 represents the memory allocated for this private variable.
+ ... copy from host to the localized clone ....
+ fir.yield(%arg1 : !fir.ref<i32>)
+ }
+ ```
+
+ * `local(x)` for "allocatables" would be emitted as:
+ ```mlir
+ fir.local {type = local} @x.privatizer : !some.type init {
+ ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
+ // initialize %arg1, using %arg0 as a mold for allocations.
+ // For example if %arg0 is a heap allocated array with a runtime determined
+ // length and !some.type is a runtime type descriptor, the init region
+ // will read the array length from %arg0, and heap allocate an array of the
+ // right length and initialize %arg1 to contain the array allocation and
+ // length.
+ fir.yield(%arg1 : !some.pointer<!some.type>)
+ } dealloc {
+ ^bb0(%arg0: !some.pointer<!some.type>):
+ // ... deallocate memory allocated by the init region...
+ // In the example above, this will free the heap allocated array data.
+ fir.yield
+ }
+ ```
+
+ There are no restrictions on the body except for:
+ - The `dealloc` regions has a single argument.
+ - The `init` & `copy` regions have 2 arguments.
+ - All three regions are terminated by `fir.yield` ops.
+ The above restrictions and other obvious restrictions (e.g. verifying the
+ type of yielded values) are verified by the custom op verifier. The actual
+ contents of the blocks inside all regions are not verified.
+
+ Instances of this op would then be used by ops that model directives that
+ accept data-sharing attribute clauses.
+
+ The `sym_name` attribute provides a symbol by which the privatizer op can be
+ referenced by other dialect ops.
+
+ The `type` attribute is the type of the value being localized. This type
+ will be implicitly allocated in MLIR->LLVMIR conversion and passed as the
+ second argument to the init region. Therefore the type of arguments to
+ the regions should be a type which represents a pointer to `type`.
+
+ The `locality_specifier_type` attribute specifies whether the localized
+ corresponds to a `local` or a `local_init` specifier.
+ }];
+
+ let arguments = (ins SymbolNameAttr:$sym_name,
+ TypeAttrOf<AnyType>:$type,
+ LocalitySpecifierTypeAttr:$locality_specifier_type);
+
+ let regions = (region AnyRegion:$init_region,
+ AnyRegion:$copy_region,
+ AnyRegion:$dealloc_region);
+
+ let assemblyFormat = [{
+ $locality_specifier_type $sym_name `:` $type
+ (`init` $init_region^)?
+ (`copy` $copy_region^)?
+ (`dealloc` $dealloc_region^)?
+ attr-dict
+ }];
+
+ let builders = [
+ OpBuilder<(ins CArg<"mlir::TypeRange">:$result,
+ CArg<"mlir::StringAttr">:$sym_name,
+ CArg<"mlir::TypeAttr">:$type)>
+ ];
+
+ let extraClassDeclaration = [{
+ /// Get the type for arguments to nested regions. This should
+ /// generally be either the same as getType() or some pointer
+ /// type (pointing to the type allocated by this op).
+ /// This method will return Type{nullptr} if there are no nested
+ /// regions.
+ mlir::Type getArgType() {
+ for (mlir::Region *region : getRegions())
+ for (mlir::Type ty : region->getArgumentTypes())
+ return ty;
+ return nullptr;
+ }
+ }];
+
+ let hasRegionVerifier = 1;
+}
+
def fir_DoConcurrentOp : fir_Op<"do_concurrent",
[SingleBlock, AutomaticAllocationScope]> {
let summary = "do concurrent loop wrapper";
diff --git a/flang/lib/Optimizer/Dialect/FIROps.cpp b/flang/lib/Optimizer/Dialect/FIROps.cpp
index 05ef69169bae5..65ec730e134c2 100644
--- a/flang/lib/Optimizer/Dialect/FIROps.cpp
+++ b/flang/lib/Optimizer/Dialect/FIROps.cpp
@@ -4909,6 +4909,105 @@ void fir::BoxTotalElementsOp::getCanonicalizationPatterns(
patterns.add<SimplifyBoxTotalElementsOp>(context);
}
+//===----------------------------------------------------------------------===//
+// LocalitySpecifierOp
+//===----------------------------------------------------------------------===//
+
+llvm::LogicalResult fir::LocalitySpecifierOp::verifyRegions() {
+ mlir::Type argType = getArgType();
+ auto verifyTerminator = [&](mlir::Operation *terminator,
+ bool yieldsValue) -> llvm::LogicalResult {
+ if (!terminator->getBlock()->getSuccessors().empty())
+ return llvm::success();
+
+ if (!llvm::isa<fir::YieldOp>(terminator))
+ return mlir::emitError(terminator->getLoc())
+ << "expected exit block terminator to be an `fir.yield` op.";
+
+ YieldOp yieldOp = llvm::cast<YieldOp>(terminator);
+ mlir::TypeRange yieldedTypes = yieldOp.getResults().getTypes();
+
+ if (!yieldsValue) {
+ if (yieldedTypes.empty())
+ return llvm::success();
+
+ return mlir::emitError(terminator->getLoc())
+ << "Did not expect any values to be yielded.";
+ }
+
+ if (yieldedTypes.size() == 1 && yieldedTypes.front() == argType)
+ return llvm::success();
+
+ auto error = mlir::emitError(yieldOp.getLoc())
+ << "Invalid yielded value. Expected type: " << argType
+ << ", got: ";
+
+ if (yieldedTypes.empty())
+ error << "None";
+ else
+ error << yieldedTypes;
+
+ return error;
+ };
+
+ auto verifyRegion = [&](mlir::Region ®ion, unsigned expectedNumArgs,
+ llvm::StringRef regionName,
+ bool yieldsValue) -> llvm::LogicalResult {
+ assert(!region.empty());
+
+ if (region.getNumArguments() != expectedNumArgs)
+ return mlir::emitError(region.getLoc())
+ << "`" << regionName << "`: "
+ << "expected " << expectedNumArgs
+ << " region arguments, got: " << region.getNumArguments();
+
+ for (mlir::Block &block : region) {
+ // MLIR will verify the absence of the terminator for us.
+ if (!block.mightHaveTerminator())
+ continue;
+
+ if (failed(verifyTerminator(block.getTerminator(), yieldsValue)))
+ return llvm::failure();
+ }
+
+ return llvm::success();
+ };
+
+ // Ensure all of the region arguments have the same type
+ for (mlir::Region *region : getRegions())
+ for (mlir::Type ty : region->getArgumentTypes())
+ if (ty != argType)
+ return emitError() << "Region argument type mismatch: got " << ty
+ << " expected " << argType << ".";
+
+ mlir::Region &initRegion = getInitRegion();
+ if (!initRegion.empty() &&
+ failed(verifyRegion(getInitRegion(), /*expectedNumArgs=*/2, "init",
+ /*yieldsValue=*/true)))
+ return llvm::failure();
+
+ LocalitySpecifierType dsType = getLocalitySpecifierType();
+
+ if (dsType == LocalitySpecifierType::Local && !getCopyRegion().empty())
+ return emitError("`local` specifiers do not require a `copy` region.");
+
+ if (dsType == LocalitySpecifierType::LocalInit && getCopyRegion().empty())
+ return emitError(
+ "`local_init` specifier require at least a `copy` region.");
+
+ if (dsType == LocalitySpecifierType::LocalInit &&
+ failed(verifyRegion(getCopyRegion(), /*expectedNumArgs=*/2, "copy",
+ /*yieldsValue=*/true)))
+ return llvm::failure();
+
+ if (!getDeallocRegion().empty() &&
+ failed(verifyRegion(getDeallocRegion(), /*expectedNumArgs=*/1, "dealloc",
+ /*yieldsValue=*/false)))
+ return llvm::failure();
+
+ return llvm::success();
+}
+
//===----------------------------------------------------------------------===//
// DoConcurrentOp
//===----------------------------------------------------------------------===//
diff --git a/flang/test/Fir/do_concurrent.fir b/flang/test/Fir/do_concurrent.fir
index 8e80ffb9c7b0b..4e55777402428 100644
--- a/flang/test/Fir/do_concurrent.fir
+++ b/flang/test/Fir/do_concurrent.fir
@@ -90,3 +90,22 @@ func.func @dc_2d_reduction(%i_lb: index, %i_ub: index, %i_st: index,
// CHECK: fir.store %[[J_IV_CVT]] to %[[J]] : !fir.ref<i32>
// CHECK: }
// CHECK: }
+
+
+fir.local {type = local} @local_privatizer : i32
+
+// CHECK: fir.local {type = local} @[[LOCAL_PRIV_SYM:local_privatizer]] : i32
+
+fir.local {type = local_init} @local_init_privatizer : i32 copy {
+^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
+ %0 = fir.load %arg0 : !fir.ref<i32>
+ fir.store %0 to %arg1 : !fir.ref<i32>
+ fir.yield(%arg1 : !fir.ref<i32>)
+}
+
+// CHECK: fir.local {type = local_init} @[[LOCAL_INIT_PRIV_SYM:local_init_privatizer]] : i32
+// CHECK: ^bb0(%[[ORIG_VAL:.*]]: !fir.ref<i32>, %[[LOCAL_VAL:.*]]: !fir.ref<i32>):
+// CHECK: %[[ORIG_VAL_LD:.*]] = fir.load %[[ORIG_VAL]]
+// CHECK: fir.store %[[ORIG_VAL_LD]] to %[[LOCAL_VAL]] : !fir.ref<i32>
+// CHECK: fir.yield(%[[LOCAL_VAL]] : !fir.ref<i32>)
+// CHECK: }
|
Adds support for lowering `do concurrent` nests from PFT to the new
`fir.do_concurrent` MLIR op as well as its special terminator
`fir.do_concurrent.loop` which models the actual loop nest.
To that end, this PR emits the allocations for the iteration variables
within the block of the `fir.do_concurrent` op and creates a region for
the `fir.do_concurrent.loop` op that accepts arguments equal in number
to the number of the input `do concurrent` iteration ranges.
For example, given the following input:
```fortran
do concurrent(i=1:10, j=11:20)
end do
```
the changes in this PR emit the following MLIR:
```mlir
fir.do_concurrent {
%22 = fir.alloca i32 {bindc_name = "i"}
%23:2 = hlfir.declare %22 {uniq_name = "_QFsub1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
%24 = fir.alloca i32 {bindc_name = "j"}
%25:2 = hlfir.declare %24 {uniq_name = "_QFsub1Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
fir.do_concurrent.loop (%arg1, %arg2) = (%18, %20) to (%19, %21) step (%c1, %c1_0) {
%26 = fir.convert %arg1 : (index) -> i32
fir.store %26 to %23#0 : !fir.ref<i32>
%27 = fir.convert %arg2 : (index) -> i32
fir.store %27 to %25#0 : !fir.ref<i32>
}
}
```
4374004 to
1211438
Compare
Adds a new `fir.local` op to model `local` and `local_init` locality specifiers. This op is a clone of `omp.private`. In particular, this new op also models the privatization/localization logic of an SSA value in the `fir` dialect just like `omp.private` does for OpenMP.
be899d5 to
09f3a12
Compare
tblah
left a comment
There was a problem hiding this comment.
Please could you add a test for the verifier failures.
On it .... |
bhandarkar-pranav
left a comment
There was a problem hiding this comment.
LGTM aside from the two nits and, more importantly, the tests that @tblah has requested
| ParentOneOf<["LocalitySpecifierOp"]>]> { | ||
| let summary = "loop yield and termination operation"; | ||
| let description = [{ | ||
| "fir.yield" yields SSA values from the fir dialect op region and |
There was a problem hiding this comment.
NIT: sed s/from the fir dialect op region/from a fir dialect op region/
|
|
||
| * `local(x)` for "allocatables" would be emitted as: | ||
| ``` | ||
| fir.local {type = local} @x.privatizer : !some.type init { |
There was a problem hiding this comment.
Ultra Nit : sed s/@x.privatizer/@x.localizer/
Done. |
…138505) Adds a new `fir.local` op to model `local` and `local_init` locality specifiers. This op is a clone of `omp.private`. In particular, this new op also models the privatization/localization logic of an SSA value in the `fir` dialect just like `omp.private` does for OpenMP. PR stack: - llvm/llvm-project#137928 - llvm/llvm-project#138505 (this PR) - llvm/llvm-project#138506 - llvm/llvm-project#138512 - llvm/llvm-project#138534 - llvm/llvm-project#138816
| explicit IncrementLoopInfo(Fortran::semantics::Symbol &sym, const T &lower, | ||
| const T &upper, const std::optional<T> &step, | ||
| bool isUnordered = false) | ||
| bool isConcurrent = false) |
There was a problem hiding this comment.
unordered is also used for array operation. how is this handled now?
There was a problem hiding this comment.
This is done somewhere else though not here. In particular, genImplicitLoops in ConvertExpr.cpp where fir.do_loop .... unordered loops are still created. See: https://github.com/llvm/llvm-project/blob/main/flang/lib/Lower/ConvertExpr.cpp#L4393.
flang/lib/Lower/Bridge.cpp
Outdated
| const Fortran::lower::SomeExpr *upperExpr; | ||
| const Fortran::lower::SomeExpr *stepExpr; | ||
| const Fortran::lower::SomeExpr *maskExpr = nullptr; | ||
| bool isUnordered; // do concurrent, forall |
There was a problem hiding this comment.
is forall treated as do concurrent?
There was a problem hiding this comment.
Nope. This also takes a different codegen path. In particular, forall concurrent headers are generated through void genFIR(const Fortran::parser::ConcurrentHeader &header) where fir.do_loop .... unordered nests are still generated. See: https://github.com/llvm/llvm-project/blob/main/flang/lib/Lower/Bridge.cpp#L2771.
clementval
left a comment
There was a problem hiding this comment.
Some post commit questions.
Thanks for taking a look. Replied to your questions. |
Thanks for the replies. All good from my side |
…r.do_loop ... unordered` (#138512) Extends lowering `fir.do_concurrent` to `fir.do_loop ... unordered` by adding support for locality specifiers. In particular, for `local` specifiers, a `fir.alloca` op is created using the localizer type. For `local_init` specifiers, the `copy` region is additionally inlined in the `do concurrent` loop's body. PR stack: - #137928 - #138505 - #138506 - #138512 (this PR) - #138534 - #138816
…pecs to `fir.do_loop ... unordered` (#138512) Extends lowering `fir.do_concurrent` to `fir.do_loop ... unordered` by adding support for locality specifiers. In particular, for `local` specifiers, a `fir.alloca` op is created using the localizer type. For `local_init` specifiers, the `copy` region is additionally inlined in the `do concurrent` loop's body. PR stack: - llvm/llvm-project#137928 - llvm/llvm-project#138505 - llvm/llvm-project#138506 - llvm/llvm-project#138512 (this PR) - llvm/llvm-project#138534 - llvm/llvm-project#138816
…ecifiers (#138534) Extends support for `fir.do_concurrent` locality specifiers to the PFT to MLIR level. This adds code-gen for generating the newly added `fir.local` ops and referencing these ops from `fir.do_concurrent.loop` ops that have locality specifiers attached to them. This reuses the `DataSharingProcessor` component and generalizes it a bit more to allow for handling `omp.private` ops and `fir.local` ops as well. PR stack: - #137928 - #138505 - #138506 - #138512 - #138534 (this PR) - #138816
…locality specifiers (#138534) Extends support for `fir.do_concurrent` locality specifiers to the PFT to MLIR level. This adds code-gen for generating the newly added `fir.local` ops and referencing these ops from `fir.do_concurrent.loop` ops that have locality specifiers attached to them. This reuses the `DataSharingProcessor` component and generalizes it a bit more to allow for handling `omp.private` ops and `fir.local` ops as well. PR stack: - llvm/llvm-project#137928 - llvm/llvm-project#138505 - llvm/llvm-project#138506 - llvm/llvm-project#138512 - llvm/llvm-project#138534 (this PR) - llvm/llvm-project#138816
… (#138816) Remove the `openmp` prefix from delayed privatization/localization flags since they are now used for `do concurrent` as well. PR stack: - llvm/llvm-project#137928 - llvm/llvm-project#138505 - llvm/llvm-project#138506 - llvm/llvm-project#138512 - llvm/llvm-project#138534 - llvm/llvm-project#138816 (this PR)
…ecifiers (#138534) Extends support for `fir.do_concurrent` locality specifiers to the PFT to MLIR level. This adds code-gen for generating the newly added `fir.local` ops and referencing these ops from `fir.do_concurrent.loop` ops that have locality specifiers attached to them. This reuses the `DataSharingProcessor` component and generalizes it a bit more to allow for handling `omp.private` ops and `fir.local` ops as well. PR stack: - #137928 - #138505 - #138506 - #138512 - #138534 (this PR) - #138816
Adds a new
fir.localop to modellocalandlocal_initlocality specifiers. This op is a clone ofomp.private. In particular, this new op also models the privatization/localization logic of an SSA value in thefirdialect just likeomp.privatedoes for OpenMP.PR stack:
do concurrentloop nests tofir.do_concurrent#137928fir.localop for locality specifiers #138505 (this PR)fir.do_concurrent.loop#138506fir.do_concurrentlocality specs tofir.do_loop ... unordered#138512