Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][MLIR][OpenMP] Extend delayed privatization for allocatables #84033

Closed
wants to merge 1 commit into from

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Mar 5, 2024

Adds delayed privatization support for allocatables. In order to explain the problem this diff is trying to solve, it would be useful to see an example of firstprivate for an alloctable and how it is currently emitted by flang without delayed privatization.

Consider the following Fortran code:

subroutine delayed_privatization_allocatable
  implicit none
  integer, allocatable :: var1

!$omp parallel firstprivate(var1)
  var1 = 10
!$omp end parallel
end subroutine

You would get something like this (again no delayed privatization yet):

    ...
    %3:2 = hlfir.declare %0 {fortran_attrs = #fir.var_attrs<allocatable>, ...} : ...
    omp.parallel {
      // Allocation logic
      ...
      %9 = fir.load %3#1 : !fir.ref<!fir.box<!fir.heap<i32>>>
      ...
      fir.if %12 {
        ...
      } else {
        ...
      }
      %13:2 = hlfir.declare %8 {fortran_attrs = #fir.var_attrs<allocatable>, ...} : ...

      // Copy logic
      %14 = fir.load %13#0 : !fir.ref<!fir.box<!fir.heap<i32>>>
      ...
      fir.if %17 {
        %24 = fir.load %3#0 : !fir.ref<!fir.box<!fir.heap<i32>>>
        ...
      }
      ...
      omp.terminator
    }

Note that for both the original and the private declaration, the allocation and copy logic use both elements (e.g. %3#0 and %3#1). This poses the following problem for delayed privatization: how can capture both values in order to pass them to the outlined privatizer? The main issue is that hlfir.declare returns 2 SSA values that not encapsulated together under a single composite value.

Some possible solutions are:

  1. Change the return type of hlfir.declare. However, this would be very invasive and therefore does not sound like a good idea.
  2. Use the built-in tuple type to "pack" both values together. This is reasonable but not very flexible since we won't be able to express other information about the encapsulated values such as whether it is alloctable.
  3. Introduce a new concept/type that allows us to do the encapsulation in a more flexible way. This is the "variable shadow" concept introduced by this PR.

A "variable shadow" is a composite value of a special type: !fir.shadow that has some special properties:

  • It can be constructed only from BlockArguments.
  • It always has the type !fir.shadow.
  • It packs the required information needed by the delayed privatizer that correspond to the result of hlfir.declare operation.

We don't introduce any special opertions to deal with shadow values. Rather we treat them similar to other composites. If you want to get a certain component, use fir.extract_value. If you want to populate a certain component, use fir.insert_value.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Mar 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2024

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Adds delayed privatization support for allocatables. In order to explain the problem this diff is trying to solve, it would be useful to see an example of firstprivate for an alloctable and how it is currently emitted by flang without delayed privatization.

Consider the following Fortran code:

subroutine delayed_privatization_allocatable
  implicit none
  integer, allocatable :: var1

!$omp parallel firstprivate(var1)
  var1 = 10
!$omp end parallel
end subroutine

You would get something like this (again no delayed privatization yet):

    ...
    %3:2 = hlfir.declare %0 {fortran_attrs = #fir.var_attrs&lt;allocatable&gt;, ...} : ...
    omp.parallel {
      // Allocation logic
      ...
      %9 = fir.load %3#<!-- -->1 : !fir.ref&lt;!fir.box&lt;!fir.heap&lt;i32&gt;&gt;&gt;
      ...
      fir.if %12 {
        ...
      } else {
        ...
      }
      %13:2 = hlfir.declare %8 {fortran_attrs = #fir.var_attrs&lt;allocatable&gt;, ...} : ...

      // Copy logic
      %14 = fir.load %13#<!-- -->0 : !fir.ref&lt;!fir.box&lt;!fir.heap&lt;i32&gt;&gt;&gt;
      ...
      fir.if %17 {
        %24 = fir.load %3#<!-- -->0 : !fir.ref&lt;!fir.box&lt;!fir.heap&lt;i32&gt;&gt;&gt;
        ...
      }
      ...
      omp.terminator
    }

Note that for both the original and the private declaration, the allocation and copy logic use both elements (e.g. %3#<!-- -->0 and %3#<!-- -->1). This poses the following problem for delayed privatization: how can capture both values in order to pass them to the outlined privatizer? The main issue is that hlfir.declare returns 2 SSA values that not encapsulated together under a single composite value.

Some possible solutions are:

  1. Change the return type of hlfir.declare. However, this would be very invasive and therefore does not sound like a good idea.
  2. Use the built-in tuple type to "pack" both values together. This is reasonable but not very flexible since we won't be able to express other information about the encapsulated values such as whether it is alloctable.
  3. Introduce a new concept/type that allows us to do the encapsulation in a more flexible way. This is the "variable shadow" concept introduced by this PR.

A "variable shadow" is a composite value of a special type: !fir.shadow that has some special properties:

  • It can be constructed only from BlockArguments.
  • It always has the type !fir.shadow.
  • It packs the required information needed by the delayed privatizer that correspond to the result of hlfir.declare operation.

We don't introduce any special opertions to deal with shadow values. Rather we treat them similar to other composites. If you want to get a certain component, use fir.extract_value. If you want to populate a certain component, use fir.insert_value.


Patch is 44.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/84033.diff

19 Files Affected:

  • (modified) flang/include/flang/Lower/SymbolMap.h (+4-1)
  • (modified) flang/include/flang/Optimizer/Builder/BoxValue.h (+9-1)
  • (modified) flang/include/flang/Optimizer/Builder/HLFIRTools.h (+28-3)
  • (added) flang/include/flang/Optimizer/Builder/VariableShadow.h (+47)
  • (modified) flang/include/flang/Optimizer/Dialect/FIRTypes.td (+64-1)
  • (modified) flang/lib/Lower/Bridge.cpp (+15)
  • (modified) flang/lib/Lower/ConvertExpr.cpp (+3)
  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.cpp (+97-14)
  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+20-3)
  • (modified) flang/lib/Lower/SymbolMap.cpp (+3)
  • (modified) flang/lib/Optimizer/Builder/BoxValue.cpp (+9-2)
  • (modified) flang/lib/Optimizer/Builder/HLFIRTools.cpp (+67-1)
  • (modified) flang/lib/Optimizer/Builder/MutableBox.cpp (+6)
  • (modified) flang/lib/Optimizer/Dialect/FIRType.cpp (+1-1)
  • (added) flang/test/Lower/OpenMP/delayed-privatization-allocatable.f90 (+69)
  • (modified) flang/test/Lower/OpenMP/delayed-privatization-firstprivate.f90 (+25-9)
  • (modified) flang/test/Lower/OpenMP/delayed-privatization-private-firstprivate.f90 (+12-4)
  • (modified) flang/test/Lower/OpenMP/delayed-privatization-private.f90 (+20-5)
  • (modified) flang/test/Lower/OpenMP/delayed-privatization-reduction.f90 (+3-3)
diff --git a/flang/include/flang/Lower/SymbolMap.h b/flang/include/flang/Lower/SymbolMap.h
index a55e4b133fe0a8..395c22c4b94f5a 100644
--- a/flang/include/flang/Lower/SymbolMap.h
+++ b/flang/include/flang/Lower/SymbolMap.h
@@ -15,6 +15,7 @@
 
 #include "flang/Common/reference.h"
 #include "flang/Optimizer/Builder/BoxValue.h"
+#include "flang/Optimizer/Builder/VariableShadow.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/Dialect/FortranVariableInterface.h"
 #include "flang/Optimizer/Support/Matcher.h"
@@ -77,7 +78,8 @@ struct SymbolBox : public fir::details::matcher<SymbolBox> {
 
   using VT =
       std::variant<Intrinsic, FullDim, Char, CharFullDim, PointerOrAllocatable,
-                   Box, fir::FortranVariableOpInterface, None>;
+                   Box, fir::FortranVariableOpInterface,
+                   hlfir::FortranVariableShadow, None>;
 
   //===--------------------------------------------------------------------===//
   // Constructors
@@ -101,6 +103,7 @@ struct SymbolBox : public fir::details::matcher<SymbolBox> {
                  [](const fir::FortranVariableOpInterface &x) {
                    return fir::FortranVariableOpInterface(x).getBase();
                  },
+                 [](const hlfir::FortranVariableShadow &x) { return x.getBase(); },
                  [](const auto &x) { return x.getAddr(); });
   }
 
diff --git a/flang/include/flang/Optimizer/Builder/BoxValue.h b/flang/include/flang/Optimizer/Builder/BoxValue.h
index 2fed2d48a7a080..151a197cc29b7f 100644
--- a/flang/include/flang/Optimizer/Builder/BoxValue.h
+++ b/flang/include/flang/Optimizer/Builder/BoxValue.h
@@ -13,6 +13,8 @@
 #ifndef FORTRAN_OPTIMIZER_BUILDER_BOXVALUE_H
 #define FORTRAN_OPTIMIZER_BUILDER_BOXVALUE_H
 
+#include "flang/Optimizer/Builder/Todo.h"
+#include "flang/Optimizer/Builder/VariableShadow.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/Support/FatalError.h"
 #include "flang/Optimizer/Support/Matcher.h"
@@ -479,7 +481,8 @@ class ExtendedValue : public details::matcher<ExtendedValue> {
 public:
   using VT =
       std::variant<UnboxedValue, CharBoxValue, ArrayBoxValue, CharArrayBoxValue,
-                   ProcBoxValue, BoxValue, MutableBoxValue, PolymorphicValue>;
+                   ProcBoxValue, BoxValue, MutableBoxValue, PolymorphicValue,
+                   hlfir::FortranVariableShadow>;
 
   ExtendedValue() : box{UnboxedValue{}} {}
   template <typename A, typename = std::enable_if_t<
@@ -516,6 +519,11 @@ class ExtendedValue : public details::matcher<ExtendedValue> {
                  [](const fir::CharBoxValue &box) -> unsigned { return 0; },
                  [](const fir::ProcBoxValue &box) -> unsigned { return 0; },
                  [](const fir::PolymorphicValue &box) -> unsigned { return 0; },
+                 [](const hlfir::FortranVariableShadow &box) -> unsigned {
+                   TODO(box.getValue().getLoc(),
+                        "rank(): FortranVariableShadow");
+                   return 0;
+                 },
                  [](const auto &box) -> unsigned { return box.rank(); });
   }
 
diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 170e134baef614..1de6e6b2d340da 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -14,6 +14,7 @@
 #define FORTRAN_OPTIMIZER_BUILDER_HLFIRTOOLS_H
 
 #include "flang/Optimizer/Builder/BoxValue.h"
+#include "flang/Optimizer/Builder/VariableShadow.h"
 #include "flang/Optimizer/Dialect/FIROps.h"
 #include "flang/Optimizer/Dialect/FortranVariableInterface.h"
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
@@ -55,8 +56,12 @@ class Entity : public mlir::Value {
   }
   Entity(fir::FortranVariableOpInterface variable)
       : mlir::Value(variable.getBase()) {}
+  Entity(FortranVariableShadow variableShadow)
+      : mlir::Value(variableShadow.getValue()) {}
+
   bool isValue() const { return isFortranValue(*this); }
   bool isVariable() const { return !isValue(); }
+  bool isMutableShadow() const;
   bool isMutableBox() const { return hlfir::isBoxAddressType(getType()); }
   bool isProcedurePointer() const {
     return fir::isBoxProcAddressType(getType());
@@ -74,6 +79,10 @@ class Entity : public mlir::Value {
   /// Is this an assumed ranked entity?
   bool isAssumedRank() const { return getRank() == -1; }
 
+  bool isFortranVariableShadow() const {
+    return this->getType().isa<fir::ShadowType>();
+  }
+
   /// Return the rank of this entity or -1 if it is an assumed rank.
   int getRank() const {
     mlir::Type type = fir::unwrapPassByRefType(fir::unwrapRefType(getType()));
@@ -152,6 +161,14 @@ class Entity : public mlir::Value {
     return this->getDefiningOp<fir::FortranVariableOpInterface>();
   }
 
+  fir::ShadowType getIfVariableShadow() const {
+    fir::ExtractValueOp op = this->getDefiningOp<fir::ExtractValueOp>();
+    if (!op)
+      return fir::ShadowType();
+
+    return op.getAdt().getType().dyn_cast<fir::ShadowType>();
+  }
+
   // Return a "declaration" operation for this variable if visible,
   // or the "declaration" operation of the allocatable/pointer this
   // variable was dereferenced from (if it is visible).
@@ -174,8 +191,13 @@ class Entity : public mlir::Value {
   }
 
   bool isAllocatable() const {
-    auto varIface = getIfVariableInterface();
-    return varIface ? varIface.isAllocatable() : false;
+    if (auto varIface = getIfVariableInterface())
+      return varIface.isAllocatable();
+
+    if (auto shadowType = getIfVariableShadow())
+      return shadowType.getAllocatable();
+
+    return false;
   }
 
   bool isPointer() const {
@@ -185,7 +207,7 @@ class Entity : public mlir::Value {
 
   // Get the entity as an mlir SSA value containing all the shape, type
   // parameters and dynamic shape information.
-  mlir::Value getBase() const { return *this; }
+  mlir::Value getBase() const;
 
   // Get the entity as a FIR base. This may not carry the shape and type
   // parameters information, and even when it is a box with shape information.
@@ -231,6 +253,9 @@ translateToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
 fir::ExtendedValue
 translateToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
                          fir::FortranVariableOpInterface fortranVariable);
+fir::ExtendedValue
+translateToExtendedValue(mlir::Location loc, fir::FirOpBuilder &builder,
+                         hlfir::FortranVariableShadow fortranVariableShadow);
 
 /// Generate declaration for a fir::ExtendedValue in memory.
 fir::FortranVariableOpInterface
diff --git a/flang/include/flang/Optimizer/Builder/VariableShadow.h b/flang/include/flang/Optimizer/Builder/VariableShadow.h
new file mode 100644
index 00000000000000..af83c4e34abb60
--- /dev/null
+++ b/flang/include/flang/Optimizer/Builder/VariableShadow.h
@@ -0,0 +1,47 @@
+//===-- VariableShadow.h ----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Coding style: https://mlir.llvm.org/getting_started/DeveloperGuide/
+//
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Dialect/FortranVariableInterface.h"
+#include "mlir/IR/Value.h"
+
+#ifndef FORTRAN_OPTIMIZER_BUILDER_VARIABLESHADOW_H
+#define FORTRAN_OPTIMIZER_BUILDER_VARIABLESHADOW_H
+
+namespace hlfir {
+
+class FortranVariableShadow {
+public:
+  FortranVariableShadow(fir::FirOpBuilder &builder,
+                        mlir::BlockArgument shadowingVal,
+                        fir::FortranVariableOpInterface shadowedVariable);
+
+  mlir::BlockArgument getValue() const { return shadowingVal; }
+
+  mlir::Value getBase() const { return base; }
+
+  mlir::Value getFirBase() const { return firBase; }
+
+  fir::FortranVariableOpInterface getShadowedVariable() const {
+    return shadowedVariable;
+  }
+
+private:
+  mlir::BlockArgument shadowingVal;
+  mlir::Value base;
+  mlir::Value firBase;
+  fir::FortranVariableOpInterface shadowedVariable;
+};
+
+} // namespace hlfir
+
+#endif // FORTRAN_OPTIMIZER_BUILDER_VARIABLESHADOW_H
diff --git a/flang/include/flang/Optimizer/Dialect/FIRTypes.td b/flang/include/flang/Optimizer/Dialect/FIRTypes.td
index 2a2f50720859e7..d83b216bce7b30 100644
--- a/flang/include/flang/Optimizer/Dialect/FIRTypes.td
+++ b/flang/include/flang/Optimizer/Dialect/FIRTypes.td
@@ -576,6 +576,65 @@ def fir_VoidType : FIR_Type<"Void", "void"> {
   let genStorageClass = 0;
 }
 
+def fir_VariableShadowType : FIR_Type<"Shadow", "shadow"> {
+  let summary = "Type for Fortran-variables shadow values";
+
+  let description = [{
+    A type to encapulate required data for a Fortran-variable shadow value. A
+    shadow type encapsulates the base type as well as the original base type. It
+    also encapsulates information other information about the shadowed value,
+    such as whether it is allocatable or not.
+
+    For example, if we want to shadow a value produced a `hlfir.declare`
+    instruction similar to the following:
+    ```
+    %1:2 = hlfir.declare %0
+      {fortran_attrs = #fir.var_attrs<allocatable>}
+      : (!fir.ref<!fir.box<!fir.heap<i32>>>)
+        -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+    ```
+
+    We would need to somehow create a value of the type:
+    ```
+    !fir.shadow<!fir.ref<!fir.box<!fir.heap<i32>>>,
+                !fir.ref<!fir.box<!fir.heap<i32>>>,
+                allocatable : true>
+    ```
+
+    The mechanism to create this value can be, for example, through a
+    combincation `fir.undefined`, `fir.insert_value` instructions similar to what
+    would be done for other composite types like `tuple`s:
+    ```
+    %2 = fir.undefined !fir.shadow<!fir.ref<!fir.box<!fir.heap<i32>>>,
+                                   !fir.ref<!fir.box<!fir.heap<i32>>>,
+                                   allocatable : true>
+    %3 = fir.insert_value %2, %1#0, [0 : index] : <... type info omitted ...>
+    %4 = fir.insert_value %3, %3#1, [1 : index] : <... type info omitted ...>
+    ```
+
+    From that point on, you can use `%4` as the shadow of the originally declared
+    variable being shadowed `%1`.
+
+    For more info about the concept of variable shadows, check `VariableShadow.h`.
+  }];
+
+  let parameters = (ins "mlir::Type":$baseTy,
+                        "mlir::Type":$firBaseTy,
+                        "bool":$allocatable);
+
+  let assemblyFormat = [{
+    `<` $baseTy `,` $firBaseTy `,` `allocatable` `:` $allocatable `>`
+  }];
+
+  let extraClassDeclaration = [{
+    mlir::Type getType(unsigned idx) const {
+      if (idx == 0) return getBaseTy();
+      if (idx == 1) return getFirBaseTy();
+      return mlir::Type();
+    }
+  }];
+}
+
 // Whether a type is a BaseBoxType
 def IsBaseBoxTypePred
         : CPred<"$_self.isa<::fir::BaseBoxType>()">;
@@ -590,13 +649,17 @@ def AnyRealLike : TypeConstraint<Or<[FloatLike.predicate,
     fir_RealType.predicate]>, "any real">;
 def AnyIntegerType : Type<AnyIntegerLike.predicate, "any integer">;
 
+def IsShadowTypePred
+        : CPred<"$_self.isa<::fir::ShadowType>()">;
+
 // Composable types
 // Note that we include both fir_ComplexType and AnyComplex, so we can use both
 // the FIR ComplexType and the MLIR ComplexType (the former is used to represent
 // Fortran complex and the latter for C++ std::complex).
 def AnyCompositeLike : TypeConstraint<Or<[fir_RecordType.predicate,
     fir_SequenceType.predicate, fir_ComplexType.predicate, AnyComplex.predicate,
-    fir_VectorType.predicate, IsTupleTypePred, fir_CharacterType.predicate]>,
+    fir_VectorType.predicate, IsTupleTypePred, fir_CharacterType.predicate,
+    IsShadowTypePred]>,
     "any composite">;
 
 // Reference types
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 153ce0623ab3ae..3d2ff056aed650 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -468,6 +468,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
           return hlfir::translateToExtendedValue(getCurrentLocation(),
                                                  getFirOpBuilder(), x);
         },
+        [&](const hlfir::FortranVariableShadow &x) -> fir::ExtendedValue {
+          return hlfir::translateToExtendedValue(getCurrentLocation(),
+                                                 getFirOpBuilder(), x);
+        },
         [](const auto &box) -> fir::ExtendedValue { return box; });
   }
 
@@ -1009,6 +1013,16 @@ class FirConverter : public Fortran::lower::AbstractConverter {
         return v.match(
             [&](const Fortran::lower::SymbolBox::Intrinsic &)
                 -> Fortran::lower::SymbolBox { return v; },
+            [&](const hlfir::FortranVariableShadow &box)
+                -> Fortran::lower::SymbolBox {
+              auto exv =
+                  hlfir::translateToExtendedValue(toLocation(), *builder, box);
+              return exv.match(
+                  [](mlir::Value x) -> Fortran::lower::SymbolBox {
+                    return Fortran::lower::SymbolBox::Intrinsic{x};
+                  },
+                  [](auto x) -> Fortran::lower::SymbolBox { return x; });
+            },
             [](const auto &) -> Fortran::lower::SymbolBox { return {}; });
 
       return {};
@@ -1066,6 +1080,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
     assert(lowerToHighLevelFIR());
     hlfir::Entity lhs{dst};
     hlfir::Entity rhs{src};
+
     // Temporary_lhs is set to true in hlfir.assign below to avoid user
     // assignment to be used and finalization to be called on the LHS.
     // This may or may not be correct but mimics the current behaviour
diff --git a/flang/lib/Lower/ConvertExpr.cpp b/flang/lib/Lower/ConvertExpr.cpp
index d157db2cde4961..9b69bb10967349 100644
--- a/flang/lib/Lower/ConvertExpr.cpp
+++ b/flang/lib/Lower/ConvertExpr.cpp
@@ -981,6 +981,9 @@ class ScalarExprLowering {
           },
           [&](const fir::ProcBoxValue &toBox) {
             TODO(loc, "procedure pointer component in derived type assignment");
+          },
+          [&](const hlfir::FortranVariableShadow &toBox) {
+            TODO(loc, "FortranVariableShadow in derived type assignment");
           });
     }
     return res;
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 717b8cc0276a30..be88d30e12d306 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -15,7 +15,9 @@
 #include "Utils.h"
 #include "flang/Lower/PFTBuilder.h"
 #include "flang/Lower/SymbolMap.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
 #include "flang/Optimizer/Builder/Todo.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Semantics/tools.h"
 #include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 
@@ -357,6 +359,28 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
   bool isFirstPrivate =
       sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate);
 
+  mlir::Type privatizerArgType = hsb.match(
+      [&](const fir::FortranVariableOpInterface &box) -> mlir::Type {
+        hlfir::Entity variable(box);
+        return fir::ShadowType::get(
+            &converter.getMLIRContext(), variable.getBase().getType(),
+            variable.getFirBase().getType(), variable.isAllocatable());
+      },
+      [&](const auto &box) -> mlir::Type { return symType; });
+
+  auto addSymbol = [&](mlir::Region &region, unsigned argIdx, bool force) {
+    hsb.match(
+        [&](const fir::FortranVariableOpInterface &box) {
+          hlfir::Entity variable(box);
+          hlfir::FortranVariableShadow variableShadow(
+              firOpBuilder, region.getArgument(argIdx), box);
+          symTable->addSymbol(*sym, variableShadow, force);
+        },
+        [&](const auto &box) {
+          symTable->addSymbol(*sym, region.getArgument(argIdx), force);
+        });
+  };
+
   mlir::omp::PrivateClauseOp privatizerOp = [&]() {
     auto moduleOp = firOpBuilder.getModule();
     auto uniquePrivatizerName = fir::getTypeAsString(
@@ -373,7 +397,7 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
     firOpBuilder.setInsertionPoint(&moduleOp.getBodyRegion().front(),
                                    moduleOp.getBodyRegion().front().begin());
     auto result = firOpBuilder.create<mlir::omp::PrivateClauseOp>(
-        symLoc, uniquePrivatizerName, symType,
+        symLoc, uniquePrivatizerName, privatizerArgType,
         isFirstPrivate ? mlir::omp::DataSharingClauseType::FirstPrivate
                        : mlir::omp::DataSharingClauseType::Private);
 
@@ -383,15 +407,44 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
     {
       mlir::Region &allocRegion = result.getAllocRegion();
       mlir::Block *allocEntryBlock = firOpBuilder.createBlock(
-          &allocRegion, /*insertPt=*/{}, symType, symLoc);
+          &allocRegion, /*insertPt=*/{}, privatizerArgType, symLoc);
 
       firOpBuilder.setInsertionPointToEnd(allocEntryBlock);
-      symTable->addSymbol(*sym, allocRegion.getArgument(0));
+      addSymbol(allocRegion, 0, false);
       symTable->pushScope();
       cloneSymbol(sym);
-      firOpBuilder.create<mlir::omp::YieldOp>(
-          hsb.getAddr().getLoc(),
-          symTable->shallowLookupSymbol(*sym).getAddr());
+
+      auto genYieldedVal = [&]() -> mlir::Value {
+        auto symBox = symTable->shallowLookupSymbol(*sym);
+        mlir::Value addr = symBox.getAddr();
+        mlir::Operation *definingOp = addr.getDefiningOp();
+        hlfir::DeclareOp declareOp =
+            mlir::dyn_cast<hlfir::DeclareOp>(definingOp);
+
+        if (declareOp == nullptr) {
+          return addr;
+        }
+        declareOp.getBase();
+        declareOp.getOriginalBase();
+        mlir::Location loc = declareOp.getLoc();
+
+        mlir::Value yieldTuple =
+            firOpBuilder.create<fir::UndefOp>(loc, privatizerArgType);
+
+        mlir::Type idxTy = firOpBuilder.getIndexType();
+
+        yieldTuple = firOpBuilder.create<fir::InsertValueOp>(
+            loc, privatizerArgType, yieldTuple, declareOp.getBase(),
+            firOpBuilder.getArrayAttr({firOpBuilder.getIntegerAttr(idxTy, 0)}));
+
+        yieldTuple = firOpBuilder.create<fir::InsertValueOp>(
+            loc, privatizerArgType, yieldTuple, declareOp.getOriginalBase(),
+            firOpBuilder.getArrayAttr({firOpBuilder.getIntegerAttr(idxTy, 1)}));
+        return yieldTuple;
+      };
+
+      firOpBuilder.create<mlir::omp::YieldOp>(hsb.getAddr().getLoc(),
+                                              genYieldedVal());
       symTable->popScope();
     }
 
@@ -401,18 +454,19 @@ void DataSharingProcessor::doPrivatize(const Fortran::semantics::Symbol *sym) {
       // First block argument corresponding to the original/host value while
       // second block argument corresponding to the privatized value.
       mlir::Block *copyEntryBlock = firOpBuilder.createBlock(
-          &copyRegion, /*insertPt=*/{}, {symType, symType}, {symLoc, symLoc});
+          &copyRegion, /*insertPt=*/{}, {privatizerArgType, privatizerArgType},
+          {symLoc, symLoc});
       firOpBuilder.setInsertionPointToEnd(copyEntryBlock);
-      symTable->addSymbol(*sym, copyRegion.getArgument(0),
-                          /*force=*/true);
+      addSymbol(copyRegion, 0, true);
+
       symTable->pushScope();
-      symTable->addSymbol(*sym, copyRegion.getArgument(1));
+      addSymbol(copyRegion, 1, false);
+
       auto ip = firOpBuilder.saveInsertionPoint();
       copyFirstPrivateSymbol(sym, &ip);
 
-      firOpBuilder.create<mlir::omp::YieldOp>(
-          hsb.getAddr().getLoc(),
-          symTable->shallowLookupSymbol(*sym).getAddr());
+      firOpBuilder.create<mlir::omp::YieldOp>(hsb.getAddr().getLoc(),
+                                              copyRegion.getArgument(1));
       symTable->popScope();
     }
 
@@ -423,7 +477...
[truncated]

Copy link

github-actions bot commented Mar 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ergawy ergawy requested review from TIFitis and anchuraj March 5, 2024 15:43
Adds delayed privatization support for allocatables. In order to explain
the problem this diff is trying to solve, it would be useful to see an
example of `firstprivate` for an alloctable and how it is currently
emitted by flang **without** delayed privatization.

Consider the following Fortran code:
```fortran
subroutine delayed_privatization_allocatable
  implicit none
  integer, allocatable :: var1

!$omp parallel firstprivate(var1)
  var1 = 10
!$omp end parallel
end subroutine
```

You would get something like this (again no delayed privatization yet):
```mlir
    ...
    %3:2 = hlfir.declare %0 {fortran_attrs = #fir.var_attrs<allocatable>, ...} : ...
    omp.parallel {
      // Allocation logic
      ...
      %9 = fir.load %3#1 : !fir.ref<!fir.box<!fir.heap<i32>>>
      ...
      fir.if %12 {
        ...
      } else {
        ...
      }
      %13:2 = hlfir.declare %8 {fortran_attrs = #fir.var_attrs<allocatable>, ...} : ...

      // Copy logic
      %14 = fir.load %13#0 : !fir.ref<!fir.box<!fir.heap<i32>>>
      ...
      fir.if %17 {
        %24 = fir.load %3#0 : !fir.ref<!fir.box<!fir.heap<i32>>>
        ...
      }
      ...
      omp.terminator
    }
```

Note that for both the original and the private declaration, the
allocation and copy logic use both elements (e.g. `%3#0` and `%3#1`).
This poses the following problem for delayed privatization: how can
capture both values in order to pass them to the outlined privatizer?
The main issue is that `hlfir.declare` returns 2 SSA values that not
encapsulated together under a single composite value.

Some possible solutions are:
1. Change the return type of `hlfir.declare`. However, this would be
   very invasive and therefore does not sound like a good idea.
2. Use the built-in `tuple` type to "pack" both values together. This is
   reasonable but not very flexible since we won't be able to express
   other information about the encapsulated values such as whether it is
   alloctable.
3. Introduce a new concept/type that allows us to do the encapsulation
   in a more flexible way. This is the "variable shadow" concept
   introduced by this PR.

A "variable shadow" is a composite value of a special type:
`!fir.shadow` that has some special properties:
* It can be constructed only from `BlockArgument`s.
* It always has the type `!fir.shadow`.
* It packs the required information needed by the delayed privatizer
  that correspond to the result of `hlfir.declare` operation.

We don't introduce any special opertions to deal with shadow values.
Rather we treat them similar to other composites. If you want to get a
certain component, use `fir.extract_value`. If you want to populate a
certain component, use `fir.insert_value`.
@ergawy ergawy force-pushed the delayed_privatization_11 branch from 4e0b26e to 7d98b66 Compare March 5, 2024 15:51
@ergawy ergawy requested a review from abidh March 5, 2024 15:51
@kiranchandramohan
Copy link
Contributor

Note that for both the original and the private declaration, the allocation and copy logic use both elements (e.g. %3#0 and %3#1). This poses the following problem for delayed privatization: how can capture both values in order to pass them to the outlined privatizer? The main issue is that hlfir.declare returns 2 SSA values that not encapsulated together under a single composite value.

We have not made specific changes for HLFIR lowering in the privatisation code and could have inadvertently used both #0 and #1. Can we use #0 for both of the usages? It could be just about modifying getSymbolAddress or a similar function in the converter.

@ergawy
Copy link
Member Author

ergawy commented Mar 5, 2024

We have not made specific changes for HLFIR lowering in the privatisation code and could have inadvertently used both #0 and #1. Can we use #0 for both of the usages? It could be just about modifying getSymbolAddress or a similar function in the converter.

I am not sure to be honest. Since the 2 values can be of different types in some cases, I assumed both need to be maintained.

I also guess it goes deeper than getSymbolAddress. If you take a look at https://github.com/llvm/llvm-project/blob/main/flang/lib/Optimizer/Builder/HLFIRTools.cpp#L186, the code explicitly calls declareOp.getOriginalBase() which returns the second result of hlfir.declare. At the start, I tried changing this to use declareOp.getBase() instead but a huge number of tests failed. However, admittedly, I didn't closely check the failures.

This is just from what I learned in the past 2 weeks staring at this part of the codebase, so I might be over complicating things for sure.

@kiranchandramohan
Copy link
Contributor

kiranchandramohan commented Mar 5, 2024

I think using the following pattern will get you the first value and it should mostly be sufficient.

              mlir::Value hostVal = converter.getSymbolAddress(*symbol);
              if (auto declOp = hostVal.getDefiningOp<hlfir::DeclareOp>())
                hostVal = declOp.getBase();

@kiranchandramohan
Copy link
Contributor

The point I was trying to make was that the appearance of the #1 is mostly due to the use of converter.getSymbolAddress which as you point out calls declareOp.getOriginalBase() internally. I was hoping to clean this up at some point so that all usage takes #0 or declareOp.getBase().

Were the failures all in OpenMP tests?

It will be good to get an opinion from @jeanPerier on whether creating clones #0 would be sufficient.

ergawy added a commit to ergawy/llvm-project that referenced this pull request Mar 6, 2024
An experimental diff to discuss using only the first component of
`hlfir.declare`'s result for allocatables. This is just to have a more
informed discussion on llvm#84033.

This is not meant to be merged, at least for now.
@ergawy
Copy link
Member Author

ergawy commented Mar 6, 2024

Thanks a lot for the quick response and guidance @kiranchandramohan. Also, typical newbie misdirection on my side 😄.

Just for the sake of discussion, I implemented a (very) dirty hack to use only the first component for privatization logic (allocation, copying, and deallocation) without delayed privatization in the picture to simplify the discussion. This is implemented in #84123.

If using the first component is always safe and does not break any semantics I am not aware of, then I can productize: #84123 (i.e. find a less hacky solution) and then move to supporting delayed privatization for allocatables in a later PR.

@jeanPerier
Copy link
Contributor

I am not very familiar with the OpenMP outlining. Here are my quick thoughts:

  • I do not get why it is a semantic issue to capture two values instead of one (I understand this may be an perf issue).
  • It seems the new hlfir::FortranVariableShadow could simply be a fir::FortranVariableOpInterface (you would get everything you need by gathering the operation instead of the result: the attributes and all the results). The OpenMP privatization pass could attempt to always look at the producer of an mlir::Value that is captured. If it is fir::FortranVariableOpInterface, it would put the op in a some capture list, otherwise, it would capture the mlir::Value "as usual". Adding a new type in FIR is adding a lot of abstraction complexity.
  • I agree with Kiran that if it is just a perf issue, the direct solution would be to update the cloning to use #0. Especially for allocatable and pointer were #0 and #1 are exactly the same anyway.

The story of #0 vs #1 is that #0 could just be used anywhere from a correctness point of view. #1 is only here to help skipping the creation of temporary descriptors when propagating assumed-shape through calls and doing "FIR level" descriptor manipulation (the issue of assumed-shape is that there lower bounds are local, so a new descriptor #0 is created to hold the accurate lower bounds in the descriptors. But we do not care about propagating these lower bounds through calls since the entity will get new lower bounds inside the call anyway. FIR addressing operation was designed to not use the lower bound inside the descriptor (local lower bounds are provided with fir.shift if any)).

In the context of OpenMP outlining, if you need to capture a descriptor, it should be #0 I think since it has all the info, because capturing #1 would require to capture the lower bounds on top of that for any addressing operation inside the outlined code.

I think I may be able to modify converter.getSymbolAddress to use #0 for Pointers/Allocatables since it is anyway the same as #1. But for assumed-shape, it will stay #1 since fir::ExtendedValue are "FIR level" and should use #1 to avoid descriptor temporaries when possible. The cloning code should be update to use HLFIR level cloning mechanism in that case (and maybe in call cases to make things simpler).

@ergawy
Copy link
Member Author

ergawy commented Mar 7, 2024

Thanks @jeanPerier for your thoughts.

From what you are describing:

  1. For the purpose of OpenMP private cloning and copying, it always safe to use #0 even for assume-shape. Is that correct?

  2. Not exactly related to OpenMP privatization, but in order to understand the following part of your comment:

#1 is only here to help skipping the creation of temporary descriptors when propagating assumed-shape through calls and doing "FIR level" descriptor manipulation (the issue of assumed-shape is that there lower bounds are local, so a new descriptor #0 is created to hold the accurate lower bounds in the descriptors. But we do not care about propagating these lower bounds through calls since the entity will get new lower bounds inside the call anyway.

I tested a small sample:

subroutine foo()
  interface
    subroutine bar(x)
      character(*) :: x(3:)
    end subroutine
  end interface
  character(100) :: x(2:42)

  call bar(x)
end subroutine

And I thought that the call operation will use #1 from the hlfir.declare operation that declares x. I though so since this is the original SSA value without the shift (i.e. without the lower bound). However, this is the generated IR:

  func.func @_QPfoo() {
    %c100 = arith.constant 100 : index
    %c2 = arith.constant 2 : index
    %c41 = arith.constant 41 : index
    %0 = fir.alloca !fir.array<41x!fir.char<1,100>> {bindc_name = "x", uniq_name = "_QFfooEx"}
    %1 = fir.shape_shift %c2, %c41 : (index, index) -> !fir.shapeshift<1>
    %2:2 = hlfir.declare %0(%1) typeparams %c100 {uniq_name = "_QFfooEx"} : (!fir.ref<!fir.array<41x!fir.char<1,100>>>, !fir.shapeshift<1>, index) -> (!fir.box<!fir.array<41x!fir.char<1,100>>>, !fir.ref<!fir.array<41x!fir.char<1,100>>>)
    %3 = fir.convert %2#0 : (!fir.box<!fir.array<41x!fir.char<1,100>>>) -> !fir.box<!fir.array<?x!fir.char<1,?>>>
    fir.call @_QPbar(%3) fastmath<contract> : (!fir.box<!fir.array<?x!fir.char<1,?>>>) -> ()
    return
  }

So, #0 is used for the call and that makes me think I might have misunderstood what you meant.

@jeanPerier
Copy link
Contributor

For the purpose of OpenMP private cloning and copying, it always safe to use #0 even for assume-shape. Is that correct?

Yes, that is correct. Any usage of "#1" should be seen as optimizations to improve codegeneration at the FIR level. It is always correct to start with "#0".

And I thought that the call operation will use #1 from the hlfir.declare operation that declares x. I though so since this is the original SSA value without the shift (i.e. without the lower bound).

The original value is not a fir.box in that case, so what I wrote cannot apply: one cannot save the creation of a descriptor and use #1: it is not a descriptor. What I was referring to only apply when the hlfir.declare input is a descriptor.

Anyway, what I wrote above turns out to be incorrect since de2be3e that I missed and renders the usages of #1 in lowering "useless" when it is a box. I will need to analyze this commit. The drawback of this commit is that without inlining, LLVM will not be able to remove the temporary descriptors created to hold lower bounds in pure forwarding cases like below, while the temporary descriptor created to hold local bounds is useless from a Fortran point of view and the input could have been forwarded:

subroutine foo(x)
  interface
    subroutine bar(x)
      character(*) :: x(:)
    end subroutine
  end interface
  character(*) :: x(:)
  call bar(x)
end subroutine

@ergawy
Copy link
Member Author

ergawy commented Mar 7, 2024

Awesome. The picture is a lot clearer now!

Then, I will abandon this diff and try to find a way to cleanly use #0 for OpenMP all the time (regardless of delayed privatization).

@ergawy
Copy link
Member Author

ergawy commented Mar 7, 2024

Anyway, what I wrote above turns out to be incorrect since de2be3e that I missed and renders the usages of #1 in lowering "useless" when it is a box. I will need to analyze this commit.

A follow-up question, based on you analysis of de2be3e, if it turns out the commit is actually not problematic, does it mean we can change hlfir.declare to always return a single SSA value (#0 currently)? Or are there any other use cases where the second SSA value is needed?

ergawy added a commit to ergawy/llvm-project that referenced this pull request Mar 7, 2024
Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `llvm#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on llvm#84033).
ergawy added a commit that referenced this pull request Mar 11, 2024
Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on #84033).
@ergawy
Copy link
Member Author

ergawy commented Mar 11, 2024

Abandoning in favor of #84740.

@ergawy ergawy closed this Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants