[MLIR][OpenMP] Correctly handle branching within target captures #217

skatrak · 2024-12-02T15:13:24Z

This patch improves the detection of captured OpenMP constructs inside of an omp.target operation by also considering potential branches. If a nested OpenMP construct can be executed in a loop or optionally omitted by means of explicit MLIR control flow, then it's not supposed to be captured.

The following Fortran example results in such a case:

!$omp target teams
do i = 1, n
  !$omp distribute parallel do
  do j = 1, n
    ...
  end do
  !$omp end distribute parallel do
end do
!$omp end target teams

The result of lowering that code to MLIR is the creation of multiple blocks and branches inside of the omp.teams operation's region. Without this change, it is identified as an SPMD kernel during translation to LLVM IR due to the nesting of operations, but it is a generic kernel, so it causes a compiler crash. This is because it tries to get host-evaluated loop bounds that do not exist to calculate the inner loop's trip count.

omp.target map_entries(...) {
  ...
  omp.teams {
    %233 = llvm.trunc %227 : i64 to i32
    llvm.br ^bb1(%233, %225 : i32, i64)
  ^bb1(%234: i32, %235: i64):  // 2 preds: ^bb0, ^bb2
    %236 = llvm.icmp "sgt" %235, %226 : i64
    llvm.cond_br %236, ^bb2, ^bb3
  ^bb2:  // pred: ^bb1
    llvm.store %234, %arg5 : i32, !llvm.ptr
    omp.parallel ... {
      ...
    } {omp.composite}
    ...
    llvm.br ^bb1(%239, %240 : i32, i64)
  ^bb3:  // pred: ^bb1
    llvm.store %234, %arg5 : i32, !llvm.ptr
    omp.terminator
  }
  omp.terminator
}

This patch improves the detection of captured OpenMP constructs inside of an `omp.target` operation by also considering potential branches. If a nested OpenMP construct can be executed in a loop or optionally omitted by means of explicit MLIR control flow, then it's not supposed to be captured. The following Fortran example results in such a case: ```f90 !$omp target teams do i = 1, n !$omp distribute parallel do do j = 1, n ... end do !$omp end distribute parallel do end do !$omp end target teams ``` The result of lowering that code to MLIR is the creation of multiple blocks and branches inside of the `omp.teams` operation's region. Without this change, it is identified as an SPMD kernel during translation to LLVM IR due to the nesting of operations, but it is a generic kernel, so it causes a compiler crash. This is because it tries to get host-evaluated loop bounds that do not exist to calculate the inner loop's trip count.

mjklemm

LGTM

DominikAdamski

LGTM

skatrak requested review from ergawy, jsjodin, mjklemm, agozillon, DominikAdamski, TIFitis, kparzysz and bhandarkar-pranav December 2, 2024 15:13

mjklemm approved these changes Dec 2, 2024

View reviewed changes

skatrak mentioned this pull request Dec 3, 2024

[Flang][OpenMP] Enable support for standalone omp distribute construct #216

Merged

DominikAdamski approved these changes Dec 3, 2024

View reviewed changes

skatrak merged commit df7e436 into ROCm:amd-trunk-dev Dec 3, 2024
3 of 5 checks passed

skatrak deleted the target-captures-branching branch December 3, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][OpenMP] Correctly handle branching within target captures #217

[MLIR][OpenMP] Correctly handle branching within target captures #217

skatrak commented Dec 2, 2024

mjklemm left a comment

DominikAdamski left a comment

[MLIR][OpenMP] Correctly handle branching within target captures #217

[MLIR][OpenMP] Correctly handle branching within target captures #217

Conversation

skatrak commented Dec 2, 2024

mjklemm left a comment

Choose a reason for hiding this comment

DominikAdamski left a comment

Choose a reason for hiding this comment