Commit e2fc1b8
Bjoern Knafla
[SYCL][CUDA] Remove unnecessary memfence (#1935)
Remove unnecessary memory fence after a CUDA memory barrier
(__syncthreads).
The emitted `bar.sync 0` PTX instruction ensures that all memory
accesses of threads involved in the barrier `0` have been performed and
that no new memory accesses happen before the barrier completes.
The removed memory fence reduced performance without adding any
functionality to the barrier memory behavior.
Signed-off-by: Bjoern Knafla <[email protected]>
Co-authored-be: Victor Lomuller <[email protected]>1 parent b7a34be commit e2fc1b8
1 file changed
+0
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | 20 | | |
0 commit comments