Skip to content

Conversation

@bosilca
Copy link
Member

@bosilca bosilca commented Jan 21, 2024

If nodes have the same IP addresses (for containers or other purposes) and these addresses get published as part of the modex, a remote peer might try to use one of the addresses to connect. As both nodes have the same IP, there are several cases:

  • the "remote" port is not used by an OMPI process locally, the connection is refused or it timeouts. This is the "nicest" outcome, as a new IP will be used resulting in a successful connection and the continuation of the application.
  • the "remote" port is used by another OMPI process on the local node. A connection will be established but the incorrect guid will be exchanged leading to complaints, connection dropped and/or deadlocks.
  • the "remote" port is used by this process, basically resulting in a connection-to-self. Bad things happen, as we don't support TCP connections to self. Some output messages are generated, but the outcome is most likely a deadlock.

Up to now, users were expected to exclude such interfaces from the accepted interfaces, but this patch removes this need. If we discover a local IP as part of the IP list of a remote peer, we drop it and never try to use it. This does not apply to local processes, so we can still use these interfaces for node level communications (which will work as we will connect to the correct port according to the destination process).

@bosilca
Copy link
Member Author

bosilca commented Jan 21, 2024

This patch improves #12232 but does not solve it.

@bosilca
Copy link
Member Author

bosilca commented Mar 5, 2024

Can I get a review on this please.

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, had a pending review that I never submitted...

@bosilca bosilca force-pushed the fix/tcp_connection_s branch from 934289b to c50329f Compare March 6, 2024 04:02
devreal
devreal previously approved these changes Mar 6, 2024
@hppritcha
Copy link
Member

@jsquyres @ggouaillardet please review when you have a chance

@bosilca bosilca force-pushed the fix/tcp_connection_s branch 2 times, most recently from 289b46f to 76525e9 Compare January 9, 2026 20:45
If nodes have the same IP addresses (for containers or other purposes) and
these addresses get published as part of the modex, a remote peer might try to
use one of the addresses to connect. As both nodes have the same IP, there are
several cases:
- the "remote" port is not used by an OMPI process locally, the connection is
  refused or it timeouts. This is the "nicest" outcome, as a new IP will be
  used resulting in a successful connection and the continuation of the
  application.
- the "remote" port is used by another OMPI process on the local node. A
  connection will be established but the incorrect guid will be exchanged
  leading to complaints, connection dropped and/or deadlocks.
- the "remote" port is used by this process, basically resulting in a
  connection-to-self. Bad things happen, as we don't support TCP connections to
  self. Some output messages are generated, but the outcome is most likely a
  deadlock.

Up to now, users were expected to exclude such interfaces from the accepted
interfaces, but this patch removes this need. If we discover a local IP as part
of the IP list of a remote peer, we drop it and never try to use it. This does
not apply to local processes, so we can still use these interfaces for node
level communications (which will work as we will connect to the correct port
according to the destination process).

Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
@bosilca bosilca force-pushed the fix/tcp_connection_s branch from 76525e9 to f2f671c Compare January 9, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants