Skip to content

Conversation

@s-kanaev
Copy link
Contributor

@s-kanaev s-kanaev commented May 25, 2020

This patch is number two in series of patches for interop part of host task.
This patch introduces an API to enqueue host-task with interop_handle argument
See the proposal [1].

Depends on #1937

[1] https://github.com/codeplaysoftware/standards-proposals/blob/master/host_task/host_task.md

Sergey Kanaev added 2 commits May 25, 2020 15:01
@s-kanaev s-kanaev mentioned this pull request May 25, 2020
3 tasks
Signed-off-by: Sergey Kanaev <[email protected]>
@s-kanaev s-kanaev force-pushed the private/s-kanaev/ht-interop-task-iface branch from a73440f to e04ea75 Compare May 25, 2020 13:13
Sergey Kanaev added 3 commits May 25, 2020 16:28
@s-kanaev s-kanaev marked this pull request as ready for review June 10, 2020 19:36
@s-kanaev s-kanaev requested a review from a team as a code owner June 10, 2020 19:36
@s-kanaev s-kanaev requested review from Ruyk and v-klochkov June 10, 2020 19:36
@s-kanaev
Copy link
Contributor Author

s-kanaev commented Jun 10, 2020

@Ruyk, @StuartDAdams I bet you know how to improve testing here.

@Ruyk
Copy link

Ruyk commented Jun 11, 2020

Some ideas:

  • A test that uses OpenCL interop to copy data from buffer A to buffer B , by getting cl_mem objects and calling the clEnqueueBufferCopy. Then run a SYCL kernel that modifies the data in place for B, e.g. increment one, then copy back to buffer A. Run it on a loop, to ensure the dependencies and the reference counting of the objects is not leaked. We could easily do the CUDA variant even with a macro later on.
  • Same as above, but performing each command group on a separate SYCL queue (on the same or different devices). This ensures the dependency tracking works well but also there is no accidental side effects on other queues.
  • A test that does a clEnqueueWait inside the interop scope, for an event captured outside the command group. The OpenCl event can be set after the command group finishes. Must not deadlock according to implementation and proposal, sketch below:
cl_event userEvent = clCreateUserEvent(...)
q.submit([&](handler& ) {
   h.codeplay_host_task([=](interop_handler& ih) {
     clWaitForEvents(1, &userEvent); 
   }
});
clSetUserEventStatus(userEvent, CL_COMPLETE);
q.wait();

Copy link
Contributor

@nyalloc nyalloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. Got a few comments on code style and a few things I'd like to clarify, but happy to see this ball rolling.

@s-kanaev s-kanaev marked this pull request as draft June 17, 2020 16:17
@s-kanaev
Copy link
Contributor Author

@Ruyk, @StuartDAdams ping

Sergey Kanaev added 2 commits June 25, 2020 23:43
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Ruyk
Ruyk previously approved these changes Jun 26, 2020
Copy link

@Ruyk Ruyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two really minor changes required for the CUDA backend to work, but they should not block this PR. It seems to work fine otherwise.

@Ruyk
Copy link

Ruyk commented Jun 26, 2020

Are all the lit testing passing for you? host-task lit test sometimes deadlocks on my system,

[New LWP 30329]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8f775c5f30 in cl::sycl::detail::Command::enqueue(cl::sycl::detail::EnqueueResultT&, cl::sycl::detail::BlockingT) ()
   from /home/ruyman/open-source/build/lib/libsycl.so.2
(gdb) up
#1  0x00007f8f775db425 in cl::sycl::detail::Scheduler::GraphProcessor::waitForEvent(std::shared_ptr<cl::sycl::detail::event_impl>) ()
   from /home/ruyman/open-source/build/lib/libsycl.so.2
(gdb)
#2  0x00007f8f775d735d in cl::sycl::detail::Scheduler::waitForRecordToFinish(cl::sycl::detail::MemObjRecord*) ()
   from /home/ruyman/open-source/build/lib/libsycl.so.2
(gdb)
#3  0x00007f8f775d9158 in cl::sycl::detail::Scheduler::removeMemoryObject(cl::sycl::detail::SYCLMemObjI*) ()
   from /home/ruyman/open-source/build/lib/libsycl.so.2
(gdb)
#4  0x00007f8f775e8ba8 in cl::sycl::detail::SYCLMemObjT::updateHostMemory() () from /home/ruyman/open-source/build/lib/libsycl.so.2
(gdb)
#5  0x000000000040a343 in cl::sycl::detail::buffer_impl::~buffer_impl() ()
(gdb)
#6  0x000000000040a319 in void __gnu_cxx::new_allocator<cl::sycl::detail::buffer_impl>::destroy<cl::sycl::detail::buffer_impl>(cl::sycl::detail::buffer_impl*) ()
(gdb) quit

@s-kanaev
Copy link
Contributor Author

Are all the lit testing passing for you? host-task lit test sometimes deadlocks on my system

@Ruyk, please, try again with #1937 fix.

@Ruyk
Copy link

Ruyk commented Jun 26, 2020

Yes, that seems to fix the problem.

v-klochkov
v-klochkov previously approved these changes Jun 26, 2020
Signed-off-by: Sergey Kanaev <[email protected]>
@s-kanaev s-kanaev dismissed stale reviews from v-klochkov and Ruyk via e5c6cf5 June 29, 2020 12:00
@s-kanaev s-kanaev requested review from Ruyk and v-klochkov June 29, 2020 12:04
@s-kanaev
Copy link
Contributor Author

@StuartDAdams , @Ruyk the only latest changes are some stylistic ones. Please review and approve if you have no objections.

Copy link

@Ruyk Ruyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA variant I have seems to work locally (bearing two minor changes), so looks good!

@bader bader merged commit f088e38 into intel:sycl Jun 30, 2020
@s-kanaev s-kanaev deleted the private/s-kanaev/ht-interop-task-iface branch September 2, 2020 09:42
Fznamznon pushed a commit to Fznamznon/llvm that referenced this pull request Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants