You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current nccl{Send,Recv} API requires specifying a int peer for P2P ops. This causes unavoidable and undesirable device-host syncs in cases where the P2P op routings are dynamically determined.
Example: mixture of expert models (like DeepSeek v3) have dynamically-determined computation. The computation may require communication, depending on the parallelization scheme. The communication patterns can be encoded in, say, int64 CUDA tensors which contain the rank of the peers that different tensors should be sent do. Using the CUDA-tensor peer info to launch the corresponding P2P ops currently requires moving the CUDA tensors back to the host, so that the appropriate int peer values can be read off. This is an undesirable device-host sync point.
Request: allow specifying the peer via a pointer to a buffer held on the device.
The text was updated successfully, but these errors were encountered:
The current nccl{Send,Recv} API requires specifying a
int peer
for P2P ops. This causes unavoidable and undesirable device-host syncs in cases where the P2P op routings are dynamically determined.Example: mixture of expert models (like DeepSeek v3) have dynamically-determined computation. The computation may require communication, depending on the parallelization scheme. The communication patterns can be encoded in, say,
int64
CUDA tensors which contain the rank of the peers that different tensors should be sent do. Using the CUDA-tensor peer info to launch the corresponding P2P ops currently requires moving the CUDA tensors back to the host, so that the appropriateint peer
values can be read off. This is an undesirable device-host sync point.Request: allow specifying the
peer
via a pointer to a buffer held on the device.The text was updated successfully, but these errors were encountered: