mca/base: Change verbosity of failure to load dso component #10729

wckzhang · 2022-08-29T23:14:29Z

By default, dso component loading failure is printed out. This
is not an unexpected case (For example, if a component does not
have access to a library.so on runtime). Thus we bump the
verbosity level to have this warning be silenced by default.

Signed-off-by: William Zhang [email protected]

By default, dso component loading failure is printed out. This is not an unexpected case (For example, if a component does not have access to a library.so on runtime). Thus we bump the verbosity level to have this warning be silenced by default. Signed-off-by: William Zhang <[email protected]>

wckzhang · 2022-08-29T23:24:12Z

I can also change the other verbosity's in this file (they're all at 0) if someone wants it

hppritcha

I don't think this is correct. the second arg to opal_output_verbose is the output id, the first arg is the verbosity level.

Also, it looks like someone arranged it so these dlopen failures are chatty by default. I notice there's a opal_mca_base_component_show_load_errors parameter that can be set to turn off this chattiness.

gpaulsen · 2022-08-30T14:23:47Z

How does this PR interact with the existing OMPI_MCA_mca_base_component_show_load_errors mca parameter that's used in the v4.x series specifically for showing dlopen loading errors?

jsquyres

I do not agree with this PR: the verbosity is intentionally set to show by default when components that exist on the system are unable to be loaded due to an error at run time (e.g., if a dependent library that cannot be found prevents a DSO component from being loaded). This was done this way intentionally because:

We had users who didn't have dependent libraries setup properly (e.g., missing some network stack libraries on compute nodes) unexpectedly get lower performance because Open MPI fell back to the TCP BTL. When no warning was emitted, the user had no idea that their high-performance / HPC-class network wasn't being used because of a library error. Emitting the warning tells users when this happens.
In general, if a DSO component exists, it means that it was successfully built and installed (e.g., support headers and libraries were found during configure and make). Meaning: someone probably intended that this DSO component should be used. Hence, it's reasonable to believe that it's unexpected if this DSO component can't be used at run time. Therefore, we should warn about it.
- The warning message says that the DSO is ignored, and therefore execution continues. Clearly: this is a warning, not an error. That being said, if you'd like to improve the message, such as by making it more explicit that this is a non-fatal warning yadda yadda yadda, no problem -- please alter this PR to do that (but not change the verbosity level).
As @hppritcha mentioned:
- This PR doesn't change the verbosity; it changes the stream ID. That seems like the wrong thing.
- There are multiple ways to disable this message at run time, if, indeed, the user doesn't care that it will fail to be loaded. For example, you can set mca_base_component_show_load_errors to false. Or you could set the component in question to be excluded. Either of these actions are something that a human can do to explicitly tell Open MPI "I don't care if DSOs fail to load; don't warn me."

wckzhang · 2022-08-30T15:12:33Z

ACK on the stream ID vs verbosity level.

@jsquyres For #2, I thought a usage was for package managers to only ship one build that would have their components built out, ie. (cuda, rocm, libfabric, ucx, etc.), but the systems they run on would not necessarily have any of those libraries available. Are we going with the philosophy that if a dso is built, all its dependencies are intended to exist?

jsquyres · 2022-08-30T15:28:33Z

Are we going with the philosophy that if a dso is built, all its dependencies are intended to exist?

Yes.

If a packager builds an Open MPI DSO, they will list all the dependencies of that DSO as requirements for that package. E.g., if Linux distro XYZ builds openmpi.package with OFI support (i.e., the OFI DSOs exist in openmpi.package), then I would expect the packager to also list the XYZ distro package containing libfabric.so* as a requirement for openmpi.package. Hence, installing openmpi.package should also install (for example) libfabric.package, which will then make libfabric.so* be available, and therefore Open MPI's ofi components should have what they need to load successfully at run time.

(this is a somewhat simplistic example where Open MPI's ofi component only needs libfabric.so, but you get the general idea)

If packagers don't list dependencies like this, I would consider that a bug in the package.

edgargabriel · 2022-08-30T16:05:06Z

I am pretty sure that this will not work in that case with the accelerator framework. If a packager compiles e.g. Open MPI with cuda+rocm support, there will be no platform out there that fulfills both requirements, they will have either/or. In that case this goes against the main idea and benefit of the accelerator framework.

wckzhang · 2022-08-30T16:38:19Z

@jsquyres the main motivation of this change is as Edgar mentions, the accelerator framework. We currently during runtime do search and dlopen which avoids this issue, but if we want to get rid of the dlopen code and have the components built as dso's, this behavior would encounter these warnings

jsquyres · 2022-09-01T22:27:23Z

@bwbarrett and I chatted about this yesterday. Is there a reason that the accelerator framework components are doing their own dlopen'ing of their dependent libraries, rather than linking them in at build time and letting the run-time linker handle the finding/opening of the dependent libraries (like all other components in Open MPI)?

I am pretty sure that this will not work in that case with the accelerator framework. If a packager compiles e.g. Open MPI with cuda+rocm support, there will be no platform out there that fulfills both requirements, they will have either/or. In that case this goes against the main idea and benefit of the accelerator framework.

I do not think that this is the case.

Do Linux distros have packages for CUDA and/or ROCM? I don't know how CUDA/ROCM are licensed, but I suspect that Linux distros may not have packages for them because of licensing issues. In that case, the Linux distro Open MPI package won't be built for CUDA or ROCM, either. Hence, it won't be an issue in this case.

But let's assume that Packager ABC builds an Open MPI package with both CUDA and ROCM support. That means that when the ABC Open MPI package was built, both CUDA and ROCM were installed and available (so that Open MPI could find CUDA/ROCM header files and libraries). The ABC Open MPI package will therefore list both CUDA and ROCM as dependent packages. Hence, when a user installs the ABC Open MPI package, it will also install the CUDA and ROCM packages, and therefore everything will work fine. Meaning:

The ROCM component should successfully load. If the user has no ROCM-capable hardware, the ROCM component should disqualify itself at run time (with no error).
The CUDA component should successfully load. If the user has no CUDA-capable hardware, the CUDA component should disqualify itself at run time (with no error).

This is how all the other frameworks operate in Open MPI. Is there a reason to make the accelerator framework and/or components operate differently?

jsquyres · 2022-09-01T22:31:12Z

I neglected to mention the self-built/installed case.

When a user builds their own Open MPI (vs. using a pre-built package), they typically build the components that are needed for their system. E.g., if they have CUDA-capable hardware, they should have the CUDA headers + libraries installed, which will allow Open MPI to build the CUDA accelerator component. If they have no ROCM-capable hardware, they won't have ROCM headers + libraries, and therefore the ROCM accelerator component won't be built. Hence, there's an element of component self-selection when building Open MPI for a specific system/cluster/environment.

All this being said, perhaps I'm totally missing your point. If you want to get on the phone at discuss, we can certainly do that.

wckzhang · 2022-09-01T22:58:42Z

Had some offline discussion with Jeff, might pursue some other ways to get the behavior I'm looking for. Closing this PR.

edgargabriel · 2022-09-02T00:21:25Z

We can have a discussion sometime next week, but my understanding why this current approach was chosen for accelerator support stems from the fact that it is/was very common to have clusters that have both nodes with GPUs (and hence the GPU software stack), and nodes without GPUs and the related software. If a cluster administrator wanted to deploy a single MPI library installation that can serve both type of nodes and partitions, this manual dynamic loading of the dependent library offered a path to achieve this goal, without forcing the non-gpu nodes to have to install the gpu software stack. I could be wrong but at least when using dso’s the same mechanism was used quite heavily otherwise as well. We had on our cluster nodes without support for a particular file system (e.g. PVFS2), but the same MPI library could be used on nodes with and without PVFS2, without forcing the installation of the PVFS2 packages on all nodes. I don’t really want to get into the discussion on packaging rules etc.(I am not an expert on that), but in my opinion the optimal solution for a packager that wants to compile ompi with support for accelerators A/B/C simultaneously is not that it forces installing the software for all three accelerators, but only for the accelerator that is present on the machine or requested by the user/admin. From: Jeff Squyres ***@***.***> Sent: Thursday, September 1, 2022 5:28 PM To: open-mpi/ompi ***@***.***> Cc: Edgar Gabriel ***@***.***>; Comment ***@***.***> Subject: Re: [open-mpi/ompi] mca/base: Change verbosity of failure to load dso component (PR #10729) @bwbarrett<https://github.com/bwbarrett> and I chatted about this yesterday. Is there a reason that the accelerator framework components are doing their own dlopen'ing of their dependent libraries, rather than linking them in at build time and letting the run-time linker handle the finding/opening of the dependent libraries (like all other components in Open MPI)? I am pretty sure that this will not work in that case with the accelerator framework. If a packager compiles e.g. Open MPI with cuda+rocm support, there will be no platform out there that fulfills both requirements, they will have either/or. In that case this goes against the main idea and benefit of the accelerator framework. I do not think that this is the case. Do Linux distros have packages for CUDA and/or ROCM? I don't know how CUDA/ROCM are licensed, but I suspect that Linux distros may not have packages for them because of licensing issues. In that case, the Linux distro Open MPI package won't be built for CUDA or ROCM, either. Hence, it won't be an issue in this case. But let's assume that Packager ABC builds an Open MPI package with both CUDA and ROCM support. That means that when the ABC Open MPI package was built, both CUDA and ROCM were installed and available (so that Open MPI could find CUDA/ROCM header files and libraries). The ABC Open MPI package will therefore list both CUDA and ROCM as dependent packages. Hence, when a user installs the ABC Open MPI package, it will also install the CUDA and ROCM packages, and therefore everything will work fine. Meaning: * The ROCM component should successfully load. If the user has no ROCM-capable hardware, the ROCM component should disqualify itself at run time (with no error). * The CUDA component should successfully load. If the user has no CUDA-capable hardware, the CUDA component should disqualify itself at run time (with no error). This is how all the other frameworks operate in Open MPI. Is there a reason to make the accelerator framework and/or components operate differently? — Reply to this email directly, view it on GitHub<#10729 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB63MHUBMB65NZEFFHXW7TLV4EUVNANCNFSM577QVSKA>. You are receiving this because you commented.Message ID: ***@***.******@***.***>>

edgargabriel · 2022-09-02T00:50:24Z

I guess my point is that if ompi is compiled with support for accelerators A/B/C, the software packages for A/B/C should not be dependencies, since ompi would work even without them (that’s why there is the null component), they are optional. The only issue is that not being able to load a component is the ‘norm’, not an error or an exception in that scenario. From: Edgar Gabriel ***@***.***> Sent: Thursday, September 1, 2022 7:22 PM To: open-mpi/ompi ***@***.***> Cc: Edgar Gabriel ***@***.***>; Your activity ***@***.***> Subject: Re: [open-mpi/ompi] mca/base: Change verbosity of failure to load dso component (PR #10729) We can have a discussion sometime next week, but my understanding why this current approach was chosen for accelerator support stems from the fact that it is/was very common to have clusters that have both nodes with GPUs (and hence the GPU software stack), and nodes without GPUs and the related software. If a cluster administrator wanted to deploy a single MPI library installation that can serve both type of nodes and partitions, this manual dynamic loading of the dependent library offered a path to achieve this goal, without forcing the non-gpu nodes to have to install the gpu software stack. I could be wrong but at least when using dso’s the same mechanism was used quite heavily otherwise as well. We had on our cluster nodes without support for a particular file system (e.g. PVFS2), but the same MPI library could be used on nodes with and without PVFS2, without forcing the installation of the PVFS2 packages on all nodes. I don’t really want to get into the discussion on packaging rules etc.(I am not an expert on that), but in my opinion the optimal solution for a packager that wants to compile ompi with support for accelerators A/B/C simultaneously is not that it forces installing the software for all three accelerators, but only for the accelerator that is present on the machine or requested by the user/admin. From: Jeff Squyres ***@***.***<mailto:***@***.***>> Sent: Thursday, September 1, 2022 5:28 PM To: open-mpi/ompi ***@***.***<mailto:***@***.***>> Cc: Edgar Gabriel ***@***.***<mailto:***@***.***>>; Comment ***@***.***<mailto:***@***.***>> Subject: Re: [open-mpi/ompi] mca/base: Change verbosity of failure to load dso component (PR #10729) @bwbarrett<https://github.com/bwbarrett> and I chatted about this yesterday. Is there a reason that the accelerator framework components are doing their own dlopen'ing of their dependent libraries, rather than linking them in at build time and letting the run-time linker handle the finding/opening of the dependent libraries (like all other components in Open MPI)? I am pretty sure that this will not work in that case with the accelerator framework. If a packager compiles e.g. Open MPI with cuda+rocm support, there will be no platform out there that fulfills both requirements, they will have either/or. In that case this goes against the main idea and benefit of the accelerator framework. I do not think that this is the case. Do Linux distros have packages for CUDA and/or ROCM? I don't know how CUDA/ROCM are licensed, but I suspect that Linux distros may not have packages for them because of licensing issues. In that case, the Linux distro Open MPI package won't be built for CUDA or ROCM, either. Hence, it won't be an issue in this case. But let's assume that Packager ABC builds an Open MPI package with both CUDA and ROCM support. That means that when the ABC Open MPI package was built, both CUDA and ROCM were installed and available (so that Open MPI could find CUDA/ROCM header files and libraries). The ABC Open MPI package will therefore list both CUDA and ROCM as dependent packages. Hence, when a user installs the ABC Open MPI package, it will also install the CUDA and ROCM packages, and therefore everything will work fine. Meaning: * The ROCM component should successfully load. If the user has no ROCM-capable hardware, the ROCM component should disqualify itself at run time (with no error). * The CUDA component should successfully load. If the user has no CUDA-capable hardware, the CUDA component should disqualify itself at run time (with no error). This is how all the other frameworks operate in Open MPI. Is there a reason to make the accelerator framework and/or components operate differently? — Reply to this email directly, view it on GitHub<#10729 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB63MHUBMB65NZEFFHXW7TLV4EUVNANCNFSM577QVSKA>. You are receiving this because you commented.Message ID: ***@***.******@***.***<mailto:***@***.******@***.***>>> — Reply to this email directly, view it on GitHub<#10729 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB63MHWU2JGNNKRAYL2MR2LV4FCBBANCNFSM577QVSKA>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>

github-actions bot added the Target: main label Aug 29, 2022

wckzhang requested review from hppritcha and jsquyres August 29, 2022 23:14

wckzhang force-pushed the dsoverbose branch from 49e1981 to 908fab7 Compare August 29, 2022 23:23

hppritcha reviewed Aug 30, 2022

View reviewed changes

jsquyres requested changes Aug 30, 2022

View reviewed changes

wckzhang closed this Sep 1, 2022

jsquyres mentioned this pull request Sep 5, 2022

Update to "show load errors" functionality #10763

Merged

mca/base: Change verbosity of failure to load dso component #10729

mca/base: Change verbosity of failure to load dso component #10729

Uh oh!

Conversation

wckzhang commented Aug 29, 2022

Uh oh!

wckzhang commented Aug 29, 2022

Uh oh!

hppritcha left a comment

Choose a reason for hiding this comment

Uh oh!

gpaulsen commented Aug 30, 2022

Uh oh!

jsquyres left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wckzhang commented Aug 30, 2022

Uh oh!

jsquyres commented Aug 30, 2022

Uh oh!

edgargabriel commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wckzhang commented Aug 30, 2022

Uh oh!

jsquyres commented Sep 1, 2022

Uh oh!

jsquyres commented Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wckzhang commented Sep 1, 2022

Uh oh!

edgargabriel commented Sep 2, 2022 via email

Uh oh!

edgargabriel commented Sep 2, 2022 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jsquyres left a comment •

edited

Loading

edgargabriel commented Aug 30, 2022 •

edited

Loading

jsquyres commented Sep 1, 2022 •

edited

Loading