Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]it look like a potential bug that use a local variable for asynchronous copy #1621

Open
shizhixing opened this issue Feb 28, 2025 · 1 comment

Comments

@shizhixing
Copy link

in file src/channels.cc

Line:50
for (int r = 0; r < nRanks; r++) {
uintptr_t addr = (uintptr_t)(comm->sharedRes->devPeers[channelId] + comm->topParentRanks[r]);
NCCLCHECK(ncclCudaMemcpyAsync((uintptr_t*)(channel->devPeers + r), (uintptr_t*)&addr, 1, sharedRes->deviceStream.cudaStream));>>>>>>>>>>>&addr could be rewrited by next loop before doing memcpy
channel->devPeersHostPtr[r] = (struct ncclDevChannelPeer*)addr;
}
}

and also Line:87, Line 97

@sjeaugey
Copy link
Member

Indeed, that seems like an issue. Now if it ever worked, it could mean that CUDA is actually copying the data right away (maybe for small sizes) so it ends up working.

Fixing the issue may be complicated too as we'd need to free the buffer only when the operation is complete, which may not be easy.

Thanks for the report, we'll see how to fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants