Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

[RFE] Support linking multiple Thrust versions: Add hooks that wrap the thrust:: namespace in a custom namespace #1401

Closed
nv-dlasalle opened this issue Mar 19, 2021 · 10 comments · Fixed by #1464
Assignees
Labels
P0: must have Absolutely necessary. Critical issue, major blocker, etc. type: enhancement New feature or request.
Milestone

Comments

@nv-dlasalle
Copy link

Problem

Cub allows itself to place into a namespace via CUB_NS_PREFIX and CUB_NS_POSTFIX, such that multiple shared libraries can each utilize their own copy of it (and thus different versions can safely coexist). Static variables used for caching could otherwise cause problems (e.g., https://github.com/NVIDIA/cub/blob/main/cub/util_device.cuh#L212).

Thrust however depends on cub and requires it to not be in another namespace, so users cannot have CUB_NS_PREFIX defined. This means if two libraries use two different versions of thrust (or cub), issues with the caching variables inside of cub can occur.

Possible solutions

A solution would be to add THRUST_NS_PREFIX and THRUST_NS_POSTFIX to allow each library to place the version of thrust it's compiling against within in it's namespace, and either utilize the version of cub in the global namespace, or utilize the version of cub within the same namespace by defining CUB_NS_PREFIX as well.

Another solution, would be to allow users to define something like THRUST_CUB_NS, to tell thrust which namespace to look for cub in:

#define CUB_NS_PREFIX=namespace foobar {
#define CUB_NS_POSTFIX=}
...
#define THRUST_CUB_NS=foobar
#include "thrust/sort.h"
@alliepiper alliepiper changed the title Having multiple shared libraries that are compiled with different version of thrust (and cub) causes initialization issues [RFE] Support linking multiple Thrust versions: Add hooks that wrap the thrust:: namespace in a custom namespace Mar 19, 2021
@alliepiper
Copy link
Collaborator

Updated the title mention the work that needs to be done.

I think we can nicely solve this with these macro sets:

  • THRUST_CUB_WRAPPED_NAMESPACE
    • Set this to a namespace name that will wrap thrust:: AND cub::.
    • Preferred method for most usecases.
  • THRUST_WRAPPED_NAMESPACE / CUB_WRAPPED_NAMESPACE
    • Set these to a namespace name that will wrap thrust:: OR cub::.
    • Available for odd usecases.
    • Overrides THRUST_CUB_WRAPPED_NAMESPACE.
  • [THRUST|CUB]_NS_[PREFIX|POSTFIX]
    • Implementation details, but may be overridden for backwards compat.
    • Overrides any overlapping *_WRAPPED_NAMESPACE definitions.
    • Macros containing the actual code that implements the namespace wrappings.

@nv-dlasalle Does this sound reasonable for your needs?

@nv-dlasalle
Copy link
Author

@allisonvacanti This sounds like it would work perfectly for us. Thanks!

@VoVAllen
Copy link

PyTorch is experiencing the same issues pytorch/pytorch#54245. Probably CUB should avoid using static variable for cacheing in the template function

@alliepiper
Copy link
Collaborator

Some context:

Those caches were added a while back to avoid overhead from some expensive CUDA API calls. Users were seeing a significant impact from these calls under certain workloads, and the caches were necessary for good performance in some critical applications.

I agree that using statics in a header is a fragile solution and, well, generally not a good idea. But we don't really have a lot of other options -- Thrust/CUB are header-only, so we can't place the cache in a library component. C++17 inline variables may provide a nicer workaround eventually, but we can't rely on them yet.

For now, the namespace workaround will be the preferred solution, but I just wanted to share that we're aware of the issue and want to move to a more robust solution when one becomes available.

@VoVAllen
Copy link

Thanks! I found it should be fine using static variable inside a non-template non-inline function. In those cases, gcc won't compile these symbols as UNIQUE. And if every library uses RTLD_LOCAL without UNIQUE symbols, they can only see their own static variables, which avoids the conflict problems.

However, the UNIQUE symbol breaks the RTLD_LOCAL setting, that later library loaded won't instantiate its own static variable. This causes the conflict

@ngimel
Copy link

ngimel commented Mar 24, 2021

If each library indeed saw the same static variable, that would be fine - the values that are cached are supposed to be the same for all libraries. But in case of pytorch/pytorch#52663 and pytorch/pytorch#54245 a new static is allocated, but its constructor is not called, so there are 0 devices instead of correctly cached number of devices (same would happen for other cached attributes).
That said, separate namespaces for multiple thrust/cub versions sound good.

@VoVAllen
Copy link

Separate namespace could solve this issue, which is also the solution DGL team've adapted dmlc/dgl#2758. However, I still prefer to share my investigation here for people to understand the root cause and avoid similar issue in the future. It took us about one week to figure out the root cause.

I believe the cause is the UNIQUE symbol, however I found there's limited resources explaining this. I'm not an expertise in C++ so my statement here could be wrong.

Some solutions I found

  • passing -Xcompiler=-fno-gnu-unique to the nvcc compiler, which force compiler not to create any unique symbol.
  • Second possible solution I found is to make the symbol and the function as hidden when exporting the symbol, this will also change the symbol from UNIQUE to LOCL (but I haven't tried this yet)
  • Third solution is to set the function as static, which has similar effect as the second solution

Using template function with static variable will result in some unusual behavior. I hope my investigation here can help people find a better solution.

Reference:

A simple gist for the UNIQUE symbol: dmlc/dgl#2758

@alliepiper
Copy link
Collaborator

#1464 and NVIDIA/cub#326 provide the new namespace customization hooks. With those applied, defining THRUST_CUB_WRAPPED_NAMESPACE="foo" before including any Thrust/CUB headers will move the thrust:: and cub:: namespaces to foo::thrust:: and foo::cub::. By specifying different namespaces for different dynamic libraries, collisions and ambiguity can be avoided.

Alternatively, if THRUST_CUB_USE_ANON_NAMESPACE defined, all of thrust:: and cub:: will be placed in anonymous namespaces. Similar to the suggestion of using static functions, this should also address the issue, though it will likely bloat binary size as symbols can no longer be reused by different TUs in each library.

I'm wrapping up testing and reviews, but these are passing initial tests. If anyone gets a chance to try these out and see if they fix their dynamic linking problems, please let me know.

@alliepiper
Copy link
Collaborator

The fix for this has landed. Define THRUST_CUB_WRAPPED_NAMESPACE to a unique name for each library linked together, and all of thrust:: and cub:: will be placed in the requested outer namespace. This will avoid symbol collisions between libraries.

For more info:

@alliepiper
Copy link
Collaborator

(Note that the anonymous namespace macros have been removed. These were interacting badly with nvcc, and will not be implemented in the forseeable future.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0: must have Absolutely necessary. Critical issue, major blocker, etc. type: enhancement New feature or request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants