Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way for applications to use a single adapter #355

Open
pbalcer opened this issue Mar 14, 2023 · 6 comments
Open

Provide a way for applications to use a single adapter #355

pbalcer opened this issue Mar 14, 2023 · 6 comments
Labels
loader Loader related feature/bug

Comments

@pbalcer
Copy link
Contributor

pbalcer commented Mar 14, 2023

The current UR loader, to support multiple adapters, has an indirection layer that creates and maintains wrappers around UR entities (or function class types, i.e., platform, device and so on) that store a pointer to adapter functions. If there's only one adapter, this layer is unused, and the loader calls the adapter functions directly.

This indirection adds an extraneous overhead for applications that use only one adapter but have more available in the system. This issue is to devise a way to allow applications to load and use only the desired adapter implementation, thus avoiding the overhead.

Possible solutions:

  • The UR_ADAPTERS_FORCE_LOAD environmental variable can be set with a desired adapter, forcing the loader to use it. This is already possible.
  • Reintroduce platform_flags to urInit, with a way of selecting a single adapter.
    • Something like UR_PLATFORM_USE_FIRST. But this might be hard to use since the order of adapters is unspecified.
    • Alternatively, we could create a flag per known platform, something like: UR_PLATFORM_L0, UR_PLATFORM_CUDA, UR_PLATFORM_HIP, and then this could be used like this: urInit(0, UR_PLATFORM_L0 | UR_PLATFORM_CUDA);. This would only work with predefined platforms.
  • Add a way to programmatically filter platforms at urInit, for example: urInit(0, [](struct platform_descriptor *d) -> bool { return strcmp(d->name, "ur_adapter_level_zero") == 0; }). This might be clunky to use from C, but I think is the most universal.
  • Add a way of unloading platforms post-initialization. Software would iterate over available platforms and would either call urPlatformUnload on the ones it doesn't intend to use or urPlatformUseOnlyThis (can't think of a name right now :-)) on the one it does. This fits into the existing API, but might be error-prone and tricky to implement safely.
@pbalcer pbalcer added needs-discussion This needs further discussion loader Loader related feature/bug labels Mar 14, 2023
@pbalcer pbalcer added this to the 0.9 milestone Mar 14, 2023
@pbalcer
Copy link
Contributor Author

pbalcer commented Mar 14, 2023

@jandres742
Copy link

thanks @pbalcer .

maybe we don't need anything else than UR_ADAPTERS_FORCE_LOAD ? Good thing about it UR_ADAPTERS_FORCE_LOAD is that users can select one time L0, and the next time CUDA, w/o needing to force a specific change in the code. Having a flag urInit might translate then into another env var in SYCL probably, to know which flag to pass there.

if loader sees UR_ADAPTERS_FORCE_LOAD, then loader would just passthrough directly to the adapter selected, right?

are there any disadvantages or limitations on using UR_ADAPTERS_FORCE_LOAD?

@pbalcer
Copy link
Contributor Author

pbalcer commented Mar 14, 2023

Yes, if there's only one adapter specified (it supports a comma-separated list) in UR_ADAPTERS_FORCE_LOAD, then the direct code path is used.

The only problem with this approach I can think of is that it requires the user to know the exact full path to the adapter (or just the exact library name if the adapter resides in a path that dlopen can find automatically). Which might not be given if the adapter is installed automatically with some package in a custom location.

Maybe we should have a conf file/dir in /etc/ur.d/ (and something equivalent in windows) that the adapters register themselves in, and the user can then just change the config file to pick an adapter from the ones listed? Would be much more work, but we probably need something like this for windows anyway to address #128.

@jandres742
Copy link

thanks @pbalcer . Ah, so UR_ADAPTERS_FORCE_LOAD takes full path? I thought it only needed the name of the adapter. Then maybe we need another env var? something like UR_ADAPTER_LIST=<level_zero>,,, which takes a comma separated list of adapters to use. If only one passed, then passthrough in the loader is used.

We could either use the same format as ONEAPI_DEVICE_SELECTOR, https://intel.github.io/llvm-docs/EnvironmentVariables.html, or even better, just read ONEAPI_DEVICE_SELECTOR in the UR loader and if only one backend selected, then pass-through.

@pbalcer
Copy link
Contributor Author

pbalcer commented Mar 15, 2023

Yes, UR_ADAPTERS_FORCE_LOAD takes a path to dynamic libraries to load. As for ONEAPI_DEVICE_SELECTOR, the plan (#220) right now is to implement it once UR becomes the default path in SYCL, and then seamlessly switch over to filtering only in UR.

@Wee-Free-Scot
Copy link
Contributor

QMCPack uses MPI + SYCL + OpenMP. All three SW components can and do offload tasks to the available devices, possibly via different backends. All three could/will become clients of UR in the near future.

The UR_ADAPTERS_FORCE_LOAD option works, even in this situation, because it restricts all clients of UR equitably (all get passthrough or none do).
The ONEAPI_DEVICE_SELECTOR option works, even in this situation, because it restricts all clients of UR equitably (all get passthrough or none do).
The per-call to urInit option runs into issues around multiple disjoint instances vs. single shared instance.

OpenMP requires only one adapter (because of a requirement for homogeneity of devices, allegedly) -- it will wish always to call urInit with exactly one platform, even if other clients of UR concurrently call urInit with multiple platforms or without restricting platforms (i.e. de facto multiple). Does OpenMP get its own instance of UR that uses the passthrough fast-path or does it get a shared instance of UR that uses indirection because some other client asked for that?

@kbenzie kbenzie removed this from the 0.7 milestone Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
loader Loader related feature/bug
Projects
None yet
Development

No branches or pull requests

5 participants