-
Notifications
You must be signed in to change notification settings - Fork 6
Introduce complex's group algorithms #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce complex's group algorithms #41
Conversation
|
The What used to be |
Is there a standard API that could be used here, or is this only for ONEAPI? If so, can it be cmake feature gated so this can at least build with other compilers, and can it be oneapi version ifdef'd to work on older oneapi versions, in particular the latest public 2023.01 oneapi release? I do not think it is worth the effort to support old oneapi releases, but I think at least the latest public one is important to support. Did clang-format-17 produce different formatting? Sometimes adding things to .clang-format can make them behave the same, but can be tricky to figure out. |
The standard way could be to use a As you mention we could ifdef ONEAPI for |
Yes, clang-format-17 produced different formatting than the 14 version. |
|
As we want to be this code to be portable to other implementations (hip-sycl) I indeed vote for the Where we replaced |
|
LGTM! I trust the tests which pass :) |
|
Thanks again :) |
| #else | ||
| throw sycl::exception(sycl::make_error_code(sycl::errc::runtime), | ||
| "Group algorithms are not supported on host."); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is horrendously unportable.
- The spec defines
__SYCL_DEVICE_ONLY__only for SMCP compilers. For single-pass compilers, like the hipSYCL generic SSCP compiler, this is not defined in accordance with the SYCL spec - Just throwing whenever
__SYCL_DEVICE_ONLY__is not defined will completely break on any library-only host backend, where the kernel is compiled as part of the host pass for CPU.
You should seriously start looking into adding CI and validation for other compilers. That would make such issues blatantly obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add openSYCL CI - do you have a public docker image that we could use with latest version pre-installed?
Re avoiding the __SYCL_DEVICE_ONLY__ else raise pattern, do you have any suggestions on an alternative? This has come up in gtensor as well, where we have a function that could be called from host or device code, and want to have some error reporting in both cases. Is there a standard SYCL safe way to always report an error for both device and host calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a docker image at the moment (it's on the lower priority to do list to speed up our own CI), but what you need is usually fairly simple:
- Install some recent LLVM release from
apt.llvm.orgpackages, boost packages, cmake, git - Clone and install.
See our own CI (which is a bit more complicated because it tests all backends in different versions etc): https://github.com/OpenSYCL/OpenSYCL/blob/develop/.github/workflows/linux.yml
Re avoiding the SYCL_DEVICE_ONLY else raise pattern, do you have any suggestions on an alternative? This has come up in gtensor as well, where we have a function that could be called from host or device code, and want to have some error reporting in both cases. Is there a standard SYCL safe way to always report an error for both device and host calls?
I might be able to answer more specifically with more information about your use case, but in general this seems like a difficult/impossible thing to do:
- There is currently no portable mechanism in SYCL to specialize host/device code.
- As soon as a function is in the kernel call graph, it counts as device code and device code restrictions in principle apply (no exceptions are allowed or else UB). This also affects the host side, as SYCL implementations might use that to target CPU, and still rely on the absence of exceptions for optimizations.
- Error handling from within kernels is always very difficult, and there is no mechanism for this in the standard, unless you are willing to pass
sycl::streameverywhere throughout your code (you might not want that, as its bare existence might add overhead depending on the implementation). Some implementations may support printf/assert but it's not guaranteed.
The issue for pre-standardized APIs like the SYCL group algorithms is more difficult because we cannot add to the API. Outside of this, one suggestion could be to modify the API and solve this on the C++ level and not on the SYCL level using different template instantiations or overloads to distinguish host and device code paths.
In a single-pass compilation scenario, host and device are by nature extremely closely related and potentially generated from the exact same IR (even though we have mechanisms to distinguish them, but they are very different from the macro), thus they may not even be a separate thing a priori.
EDIT: Philosophically, you can consider single-pass compilers to just be better at enforcing C++ ODR rules. Having different implementations for host/device in principle is an ODR violation that only happens to work because device code is not put into the same binary object.
Can you describe your use case a bit more - how your error handling is intended to work and why you feel the need for raising errors inside the kernel call graph?
EDIT2: Something that would work at least across DPC++ and all of the compilation models in hipSYCL for code specialization would be the following:
#ifdef __HIPSYCL__
#define if_target_host(...) __hipsycl_if_target_host(__VA_ARGS__)
#define if_target_device(...) __hipsycl_if_target_device(__VA_ARGS__)
#else
#ifdef __SYCL_DEVICE_ONLY__
#define if_target_device(...) __VA_ARGS__
#define if_target_host(...)
#else
#define if_target_device(...)
#define if_target_host(...) __VA_ARGS__
#endif
#endif
// Use like so, assuming kernel_func is a kernel:
void kernel_func(auto id) {
if_target_device(
// Put device code here
);
if_target_host(
// Put host code here
);
}EDIT3: In the code here, this pattern and the error handling is not even necessary, because the group algorithms already cannot be called outside of kernel code since they take a group argument - and groups are not user constructible. So the code achieves nothing but restrict portability.
This PR adds the following list of complex and marray's group algorithms and their tests:
The test helpers have been modified to support the new tests and reduce the number of overloads.
New traits have been added and grouped in a file containing all traits's tests.
A new conversion test has also been added to support the modification made to the test helpers.
This PR has been tested on different backends and devices successfully.