Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does GDRcopy support the HPE/Cray "SlingShot" backbone? #232

Open
cponder opened this issue Aug 9, 2022 · 3 comments
Open

Does GDRcopy support the HPE/Cray "SlingShot" backbone? #232

cponder opened this issue Aug 9, 2022 · 3 comments
Labels

Comments

@cponder
Copy link

cponder commented Aug 9, 2022

I'm looking into a performance issue with an app.
If you could tell me up-front whether you support this kind of cluster, it would save some troubleshooting time.

@AddyLaddy
Copy link

Carl, the libfabric plugin and the NCCL plugin have both been able to use GDRCopy on a SlingShot based machine.

See Jim's patch: aws/aws-ofi-nccl#146

@cponder
Copy link
Author

cponder commented Aug 9, 2022

Do you know if UCX can use it? I'll check with the UCX people...

@pakmarkthub
Copy link
Collaborator

Do you mean "can UCX use GDRCopy?" ? I believe that UCX will use GDRCopy if the compile-time options / runtime environments are satisfied. The code is here: https://github.com/openucx/ucx/tree/master/src/uct/cuda/gdr_copy.

@cponder cponder changed the title Does GDRcopy support the HPE/Cray "SlignShot" backbone? Does GDRcopy support the HPE/Cray "SlingShot" backbone? Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants