-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable level zero v2 in DPC++ #16656
base: sycl
Are you sure you want to change the base?
Conversation
32fded7
to
9bc55cd
Compare
My opinion is that we should build and ship both adapters by default, and make it togglable which one is used at runtime through an env variable (like I suggested here). |
9bc55cd
to
dd18cba
Compare
dd18cba
to
cbbdcec
Compare
Good point! changed that. |
By default, I think we should only build 1 adapter. We don't want to confuse users by enumerating the same device even more times than we already do (OpenCL vs Level Zero). I think it would be nice to support building both and toggling for the reasons stated above, but I'd also like the option to ONLY build v1 or v2. |
The current proposal, and, from what I can tell, the implementation, is that both adapters will be physically present in the libs directory, however, only one of them will be active at a time. There will be a new environment variable ( Having to build SYCL from scratch with a special option might be challenging for developers of higher-level frameworks or applications that usually use a prebuilt compiler. |
buildbot/configure.py
Outdated
@@ -68,6 +68,7 @@ def do_configure(args): | |||
|
|||
if sys.platform != "darwin": | |||
sycl_enabled_backends.append("level_zero") | |||
sycl_enabled_backends.append("level_zero_v2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments
-
Can you explain what L0 V2 is and how it releases to the current L0/L0 adapter
-
I agree with James, we shouldn't build two plugins by default. If we have some configure switch to build two, that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- README. In short, it's a new L0 adapter with re-implemented performance-critical APIs. See here for performance validation results.
- Why though? As I noted, with the current implementation, users won't be affected by the inclusion of the two adapters. And not including it by default would mean that it's going to be much more difficult for application and framework developers to experiment with the new adapter, which is the whole point of including it in SYCL right now in the first place. In practice, it will likely mean we might need to provide the users that want to use it an alternative prebuilt compiler package.
@igchor ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why "v2" instead of incremental refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Got it, thanks
-
At this point it seems like the v2 adapter is only intended to be used by DPCPP/UR developers, and IMO features like that should not be built by default. If the feature is ready enough for downstream clients, should we switch over by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why "v2" instead of incremental refactoring?
The existing L0 adapter code base has accumulated a lot of complexity over the years as L0 and the driver evolved. It has become very difficult to change. For example, there are 4 major queue modes that the adapter supports - "batched in order", "batched out of order", "immediate in order", "immediate out of order". In the existing adapter, they are all implemented together, with various conditions sprinkled all throughout to make it work. In practice, batched mode is all but legacy for 99% of scenarios, and "in order" and "out of order" L0 paths are so different (different types of L0 events, different way of synchronization), that there's not much to gain from sharing all the code between them.
Refactoring this incrementally, without breaking existing code, would be challenging. So the team has decided that a partial re-implementation (we still reuse all that's practical) of the performance critical paths is the best path forward. This also means that we have the opportunity to easily leverage new driver features and do a clean break away from the accumulated legacy tech debt.
At this point it seems like the v2 adapter is only intended to be used by DPCPP/UR developers, and IMO features like that should not be built by default. If the feature is ready enough for downstream clients, should we switch over by default?
In terms of quality, the v2 adapter has higher UR CTS passrate than the current one, and passes the vast majority of SYCL e2e tests (big exception being tests that hard-code exact sequence of operations expected from the adapter). However, this is still work-in-progress.
@@ -36,6 +36,10 @@ set(UR_ENABLE_TRACING ON) | |||
if("level_zero" IN_LIST SYCL_ENABLE_BACKENDS) | |||
set(UR_BUILD_ADAPTER_L0 ON) | |||
endif() | |||
if("level_zero_v2" IN_LIST SYCL_ENABLE_BACKENDS) | |||
set(UR_BUILD_ADAPTER_L0_V2 ON) | |||
set(ENV{UR_ADAPTER_LEVEL_ZERO_V2} ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only set the env variable for the build process. We either need to change the ur loader again is that the v2 is loaded unconditionally (but i'm not sure about that, the L0 adapter might be in other searchable OS paths, and we'd end up with two adapters anyway) or set this env variable somewhere in SYCL rt init if this compile option is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to think in an easy way to do the second option and make the compile option consistent in defining the env variable, but I think we will always need the user to define the UR_ADAPTER_LEVEL_ZERO_V2
, I guess we could just depend on the user to define it, or we could define another dpc++ env variable that the user could define if that would make it more clear.
The first option seems the simpler one, but with the probability of reporting still 2 l0 adapters.
I am not sure which one is better, but I think we could just for simplicity go with the first suggestion of just registering the two l0/l0_v2
adapters, and let dpc++ control that with the compile option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I agree. I already approved the UR change. Thanks.
@omarahmed1111 will this approach enable us to mark specific e2e tests with XFAIL for v2 only, or would that require some extra changes? |
I guess that will need extra changes to lit tests config to be able to detect |
Let's leave that for a separate PR. This one isn't adding any new CI job for v2 anyway. |
This makes a simple way to use the experimental level_zero_v2 in dpc++. Fixes #16613 .