Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR][OpenMP] Correctly handle branching within target captures #217

Merged
merged 1 commit into from
Dec 3, 2024

Conversation

skatrak
Copy link

@skatrak skatrak commented Dec 2, 2024

This patch improves the detection of captured OpenMP constructs inside of an omp.target operation by also considering potential branches. If a nested OpenMP construct can be executed in a loop or optionally omitted by means of explicit MLIR control flow, then it's not supposed to be captured.

The following Fortran example results in such a case:

!$omp target teams
do i = 1, n
  !$omp distribute parallel do
  do j = 1, n
    ...
  end do
  !$omp end distribute parallel do
end do
!$omp end target teams

The result of lowering that code to MLIR is the creation of multiple blocks and branches inside of the omp.teams operation's region. Without this change, it is identified as an SPMD kernel during translation to LLVM IR due to the nesting of operations, but it is a generic kernel, so it causes a compiler crash. This is because it tries to get host-evaluated loop bounds that do not exist to calculate the inner loop's trip count.

omp.target map_entries(...) {
  ...
  omp.teams {
    %233 = llvm.trunc %227 : i64 to i32
    llvm.br ^bb1(%233, %225 : i32, i64)
  ^bb1(%234: i32, %235: i64):  // 2 preds: ^bb0, ^bb2
    %236 = llvm.icmp "sgt" %235, %226 : i64
    llvm.cond_br %236, ^bb2, ^bb3
  ^bb2:  // pred: ^bb1
    llvm.store %234, %arg5 : i32, !llvm.ptr
    omp.parallel ... {
      ...
    } {omp.composite}
    ...
    llvm.br ^bb1(%239, %240 : i32, i64)
  ^bb3:  // pred: ^bb1
    llvm.store %234, %arg5 : i32, !llvm.ptr
    omp.terminator
  }
  omp.terminator
}

This patch improves the detection of captured OpenMP constructs inside of an
`omp.target` operation by also considering potential branches. If a nested
OpenMP construct can be executed in a loop or optionally omitted by means of
explicit MLIR control flow, then it's not supposed to be captured.

The following Fortran example results in such a case:
```f90
!$omp target teams
do i = 1, n
  !$omp distribute parallel do
  do j = 1, n
    ...
  end do
  !$omp end distribute parallel do
end do
!$omp end target teams
```

The result of lowering that code to MLIR is the creation of multiple blocks and
branches inside of the `omp.teams` operation's region. Without this change, it
is identified as an SPMD kernel during translation to LLVM IR due to the
nesting of operations, but it is a generic kernel, so it causes a compiler
crash. This is because it tries to get host-evaluated loop bounds that do not
exist to calculate the inner loop's trip count.
Copy link

@mjklemm mjklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@DominikAdamski DominikAdamski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@skatrak skatrak merged commit df7e436 into ROCm:amd-trunk-dev Dec 3, 2024
3 of 5 checks passed
@skatrak skatrak deleted the target-captures-branching branch December 3, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants