-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nccl/rccl integration #469
base: main
Are you sure you want to change the base?
Conversation
Additional symbols that need to be loaded from libnccl.so:
nccl_ops_t->ncclGetUniqueId( Here, before returning, you have to call nccl_ops_t->ncclCommInitRank and create a new real NCCL's communicator. Inside MSCCL++'s ncclComm_t, you can have a void * or ncclComm_t nccl_comm. nccl_ops_t->ncclCommInitRank(&commPtr->nccl_comm, ... ) |
Add two related environment variables: Support dlopen for following nccl apis: Pass following tests rccl-test: |
Use dlopen to load nccl/rccl apis from shared library to replace Fallback code for Allgather, Allreduce, Broadcast, ReduceScatter.
Add two related environment variables
-x MSCCLPP_ENABLE_SHARED_LIB=TRUE -x MSCCLPP_NCCL_LIB_PATH=path_to_libnccl.so/librccl.so