-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu: nvidia: amd: Get native context through device #1765
gpu: nvidia: amd: Get native context through device #1765
Conversation
@hdelan, could you please address questions and feedback on this PR? |
@vpirogov thanks for ping. Changes made |
58a625d
to
0c643e9
Compare
a0fa401
to
aaced73
Compare
Thanks, @hdelan! @densamoilov is currently out, we'll have this promoted after the long weekend in US. |
A lot of code removed that was necessary for primary contexts All native contexts in DPC++ are primary contexts for CUDA and HIP, so we don't need a lot of the checking in oneDNN any more.
c0969cd
to
460d15f
Compare
460d15f
to
824ea18
Compare
@hdelan, please let me know once the PR is ready for review. |
@densamoilov I am running tests but am seeing a lot of failures on AMD one the As a result it is hard to test the correctness of these changes. Can you recommend a particular setup where tests pass on the master branch of oneDNN, ie a certain DPC++ or release version? So that I can test just the contents of this PR |
@densamoilov, using release version 2024.0.2 I have the same failures with this patch as the failures on the master branch. AMD MI210 fail list (same for
NVIDIA A100 fail list (same for
So I suppose this PR is ready for review |
Description
Since UR now supports multi device context in HIP adapter oneapi-src/unified-runtime#999 (and the same work is soon to follow in the CUDA adapter) it is no longer sensible to use the get_native method for contexts.
The mapping of native contexts to a sycl::context is now many to one, so when get_native_context is called the plugin no longer knows which native context to return. The multi device context UR PR has made the get native context return the native context of the first device in the context. See https://github.com/oneapi-src/unified-runtime/pull/999/files#diff-259fb15eb14976a3bc1939b9bb8197f51d129a111309343bb84677a655758b54R125 . But this will break for multi GPU systems.
This change instead uses the sycl::device to get the native context, since there is a one to one mapping of sycl::device to native contexts.
This change means that old versions of oneDNN will be compatible with newer plugins only if a one GPU context is being used. In order for a multi GPU system to work with the new plugin, a oneDNN version with this patch included will be necessary.
Test results to follow
Checklist
General
make test
andmake test_benchdnn_*
) pass locally for each commit?No. The following tests fail for the
master
branch as well as for my PR branch:On Nvidia A100:
HIP tests all passing for
ninja test
on gfx1031.