-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping #155993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-offload Author: Kareem Ergawy (ergawy) ChangesAdds end-to-end tests for Full diff: https://github.com/llvm/llvm-project/pull/155993.diff 2 Files Affected:
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
new file mode 100644
index 0000000000000..c6f576acb90b6
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n, m)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n, m
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:,:),intent(in) :: x
+ real(kind=real32), dimension(:,:),intent(inout) :: y
+ integer :: i, j
+
+ do concurrent(i=1:n, j=1:m)
+ y(i,j) = a * x(i,j) + y(i,j)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1,1) ",f8.6)') y(1,1)
+ write(*,'("y(n,m) ",f8.6)') y(n,m)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 1000, m=10000
+ real(kind=real32), allocatable, dimension(:,:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n,1:m), y(1:n,1:m))
+ a = 2.0_real32
+ x(:,:) = 1.0_real32
+ y(:,:) = 2.0_real32
+
+ call saxpy(a, x, y, n, m)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1,1) 4.0
+! CHECK: y(n,m) 4.0
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
new file mode 100644
index 0000000000000..e094a1d7459ef
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:),intent(in) :: x
+ real(kind=real32), dimension(:),intent(inout) :: y
+ integer :: i
+
+ do concurrent(i=1:n)
+ y(i) = a * x(i) + y(i)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1) ",f8.6)') y(1)
+ write(*,'("y(n) ",f8.6)') y(n)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 10000000
+ real(kind=real32), allocatable, dimension(:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n), y(1:n))
+ a = 2.0_real32
+ x(:) = 1.0_real32
+ y(:) = 2.0_real32
+
+ call saxpy(a, x, y, n)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1) 4.0
+! CHECK: y(n) 4.0
|
02636ca to
3dd383b
Compare
b201f91 to
f1bbd24
Compare
3dd383b to
f2e47d9
Compare
f1bbd24 to
c50d3e6
Compare
f2e47d9 to
77181e6
Compare
c50d3e6 to
2fd2022
Compare
|
Ping! Please have a look when you have time. |
77181e6 to
bd8fab0
Compare
2fd2022 to
d967c72
Compare
bd8fab0 to
f19a301
Compare
d967c72 to
d592609
Compare
…ide values (#155754) Following up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and `do concurrent` conversion pass. In particular, this PR extracts 2 utils that handle mapping or cloning values used inside target regions but defined outside. Later `do concurrent` PR(s) will also use these utils. PR stack: - #155754◀️ - #155987 - #155992 - #155993 - #156589 - #156610 - #156837
f19a301 to
db09d54
Compare
d592609 to
315f521
Compare
… clone outside values (#155754) Following up on #154483, this PR introduces further refactoring to extract some shared utils between OpenMP lowering and `do concurrent` conversion pass. In particular, this PR extracts 2 utils that handle mapping or cloning values used inside target regions but defined outside. Later `do concurrent` PR(s) will also use these utils. PR stack: - llvm/llvm-project#155754◀️ - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#156589 - llvm/llvm-project#156610 - llvm/llvm-project#156837
db09d54 to
6d564c6
Compare
315f521 to
e681a9f
Compare
e36db59 to
2177ccc
Compare
… tests (#155992) Adds more lit tests for `do concurrent` device mapping. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992◀️ - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837
Adds end-to-end tests for `do concurrent` offloading to the device.
2177ccc to
fd66849
Compare
bhandarkar-pranav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, LGTM.
…nMP mapping (#155993) Adds end-to-end tests for `do concurrent` offloading to the device. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993◀️ - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837
Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - #155754 - #155987 - #155992 - #155993 - #157638◀️ - #156610 - #156837
… (#157638) Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638◀️ - llvm/llvm-project#156610 - llvm/llvm-project#156837
Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - #155754 - #155987 - #155992 - #155993 - #157638 - #156610◀️ - #156837
…e (#156610) Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610◀️ - llvm/llvm-project#156837
…ions on the GPU (#156837) Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. PR stack: - llvm/llvm-project#155754 - llvm/llvm-project#155987 - llvm/llvm-project#155992 - llvm/llvm-project#155993 - llvm/llvm-project#157638 - llvm/llvm-project#156610 - llvm/llvm-project#156837◀️
Adds end-to-end tests for
do concurrentoffloading to the device.PR stack:
do concurrentmapping to device #155987do concurrentto device mapping lit tests #155992do concurrent: supportlocalon device #157638do concurrent: supportreduceon device #156610