Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCL inplace reduce-scatterv trashes rank 0 buffer #110

Closed
ndryden opened this issue Feb 16, 2021 · 0 comments · Fixed by #121
Closed

NCCL inplace reduce-scatterv trashes rank 0 buffer #110

ndryden opened this issue Feb 16, 2021 · 0 comments · Fixed by #121
Labels
bug Something isn't working

Comments

@ndryden
Copy link
Collaborator

ndryden commented Feb 16, 2021

We currently implement the reduce-scatter as a reduce to rank 0 followed by a scatterv. When doing an in-place op, the reduce is in-place on the input sendbuf. This therefore writes to portions of sendbuf on rank 0 that are outside of the region where the final scattered value would be placed.

I don't find something explicitly prohibiting this in the MPI standard, but:

  • It's a bit aesthetically displeasing.
  • In other cases, like a MPI_Recv with a buffer/count larger than the actual message length, MPI does guarantee that no more memory will be touched than is actually needed by the message.
  • Avoiding it shouldn't take too much overhead if we use a memory pool.
  • A better, direct implementation can probably avoid it.
@ndryden ndryden added the bug Something isn't working label Feb 16, 2021
ndryden added a commit that referenced this issue Feb 26, 2021
ndryden added a commit that referenced this issue Mar 4, 2021
ndryden added a commit that referenced this issue Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant