-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PTX wrapping functions for TMA features #379
Conversation
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
368ceda
to
9c1af7a
Compare
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
d5a6e67
to
81fa05f
Compare
2543821
to
8ba4aa5
Compare
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor_1d.pass.cpp
Outdated
Show resolved
Hide resolved
e2f7979
to
70018f6
Compare
I have:
In terms of documentation, we can either rely on the CUDA programming guide, or perhaps add a section "Experimental API" in addition to the existing "Standard API" and "Extended API" sections. This PR is now completely ready for (final?) review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge after #358
b1df77a
to
a00e849
Compare
a910014
to
1341f23
Compare
98a7864
to
7028458
Compare
7028458
to
fe67868
Compare
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_feature_test.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor_2d.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor_generic.h
Outdated
Show resolved
Hide resolved
libcudacxx/.upstream-tests/test/cuda/barrier/cp_async_bulk_tensor_generic.h
Outdated
Show resolved
Hide resolved
Co-authored-by: Michael Schellenberger Costa <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @miscco. I have implemented your feedback.
Description
closes #359
Add PTX wrapping functions for TMA features.
Checklist (still TODO)