Added high-speed interconnect support by enabling UCX#87
Added high-speed interconnect support by enabling UCX#87leofang merged 23 commits intoconda-forge:masterfrom
Conversation
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
|
Edited OP to note issues fixed. Hope that is ok :) |
|
Also I think we need the |
|
Thanks Ravishankar! 😀 |
|
Great! Hope it's useful. OpenMPI's configure seems to enable UCX automatically if it finds the headers. So I didn't need to add --with-ucx since its headers and libs get pulled into the standard path in the build environment. |
There was a problem hiding this comment.
CI failures are due to ucx only being available on Linux. So suggesting add selectors.
Currently we don't have pinning worked out in the ucx package. So included pinning to 1.9.0. Raised issue ( conda-forge/ucx-split-feedstock#88 ) about fixing this in the ucx package
Co-authored-by: jakirkham <jakirkham@gmail.com>
jakirkham
left a comment
There was a problem hiding this comment.
As we are building this on other architectures with Linux, this constrains it to x86_64
Co-authored-by: jakirkham <jakirkham@gmail.com>
Co-authored-by: jakirkham <jakirkham@gmail.com>
|
Seems fine to me: I'm new to conda packaging though, so I'm not yet familiar with how to test these properly. |
|
No worries and thanks again for working on this 🙂 |
|
@shankar1729 @jakirkham Thanks. I didn't know UCX is available on Conda-Forge. A couple questions:
|
|
|
I am testing this locally and can only get CUDA awareness work by setting Seems to be documented here. I am in favor of getting the ucx support in, but do not enable it by default. @shankar1729 Do you mind if I push to your branch? I'll make necessary changes (setting default mca parameters, add user instructions, etc) following the same treatment for plain (non-UCX) CUDA awareness. |
There was a problem hiding this comment.
Mark "changes requested" to block this while we sort things out.
On CUDA awareness: I see that ucx is compiled using CUDA. Does it work for all CUDA versions?
I just realized this was already discussed elsewhere (conda-forge/ucx-split-feedstock#66 and I was involved): UCX is sensitive to CUDA versions. Does it mean Open MPI will now be sensitive to CUDA version as well (i.e., do we have to build it using several CUDA versions as done in UCX, from 10.2 to 11.2)?
|
Responded in issue ( conda-forge/ucx-split-feedstock#66 ). Though it will require some more investigation Should add am in the middle of other work. So it might take a bit before I have bandwidth to look into this |
|
@leofang Yes, please do use my branch to stage these changes. Also, with respect to the cuda builds of UCX, it seems the build recipe has been modified to support it, but the released versions on conda-forge are without cuda. Consequently, in my local tests I need to bypass the ucx pml when using GPU memory. I have always run my gpu codes with --mca pml ob1 when using openmpi (else I get a segfault in libucp.so), so ignore my response to point 3 above. With the current released ucx package linked to openmpi, openmpi bypasses ucx because I specified this switch in my tests. Best, |
MNT: Re-rendered with conda-build 3.20.3, conda-smithy 3.10.0, and conda-forge-pinning 2021.04.09.17.43.25
This reverts commit 2847d88.
MNT: Re-rendered with conda-build 3.20.3, conda-smithy 3.10.0, and conda-forge-pinning 2021.04.21.17.39.23
|
@conda-forge-admin, please restart ci |
|
Why does this need multiple builds? Can't this be one single build as it used to be? |
So for CUDA it needs two builds: 9.2 and 10.0+. It is because UCX changes the internal code for CUDA 10.0+, see the discussion around conda-forge/ucx-split-feedstock#66 (comment). As for the additional pure CPU build, my reasoning is we use |
That is internal code and the public interface is the same, so I don't see any reason to have multiple builds for CUDA version.
That just shows that the public interface is the same, so there's no need for openmpi to have a dependence on GPU enabled or disabled UCX. |
I don't think it is true. The CUDA runtime must have the new cudaLaunchHostFunc API supported. |
That's an internal detail of ucx and openmpi doesn't care. |
@jakirkham @pentschev I think @isuruf made a convincing argument here. I think we are guarded by Copying @jsquyres in case we missed something 😅 |
MNT: Re-rendered with conda-build 3.20.3, conda-smithy 3.10.0, and conda-forge-pinning 2021.04.21.17.39.23
|
I confirmed that for all |
|
Yeah, I agree this is a UCX implementation detail, therefore the OpenMPI package doesn't need to worry about that. |
|
@conda-forge/openmpi this is ready! |
|
FYI -- the long-awaited DDT fix came in for v4.1.x last night (open-mpi/ompi#8837) -- we're making a new 4.1.1RC right now, and may well release 4.1.1 as early as tomorrow. Just in case this influences your thinking... |
|
Thanks for sharing the good news, Jeff! Yeah we will get this rebuild (of v4.1.0) in, and then build v4.1.1 afterwards so it continues to have |
|
Let's get it in and see how it goes! Thanks @shankar1729 @jakirkham @pentschev @isuruf @jsquyres! |
|
I have verified locally with a very simple MPI code that the package is working. The following combination is tested:
For 4, unlike a local build that I tested with, for some reason it just works out of box without setting tl;dr: we are cool |
Checklist
Fixes #42
Fixes #38 (missing openib support). Instead of enabling ibverbs, which is marked as deprecated within openmpi anyway, enabling ucx support is much easier and cleaner as the ucx package is already available on conda-forge.
Adding ucx as a dependency (during build and run) automatically gets openmpi to include support for a number of high-speed interconnects (Infiniband, Omnipath etc.). No change to the build scripts beyond the dependency in meta.yaml!