-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/WIREUP/IB: Fix error message when FLID is not available #10515
base: master
Are you sure you want to change the base?
UCP/WIREUP/IB: Fix error message when FLID is not available #10515
Conversation
src/uct/ib/base/ib_iface.c
Outdated
return 0; | ||
if (!iface->config.flid_enabled) { | ||
flid_info_str = "disabled"; | ||
} else if ((uct_ib_iface_gid_extract_flid(&iface->gid_info.gid)) == 0 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if ((uct_ib_iface_gid_extract_flid(&iface->gid_info.gid)) == 0 || | |
} else if ((uct_ib_iface_gid_extract_flid(&iface->gid_info.gid) == 0) || |
5873820
to
4b5f309
Compare
src/ucp/wireup/select.c
Outdated
@@ -592,7 +592,7 @@ static UCS_F_NOINLINE ucs_status_t ucp_wireup_select_transport( | |||
} | |||
|
|||
is_reachable = 0; | |||
|
|||
ucs_snprintf_safe(info_str, info_str_size, "unreachable"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use snprintf here for consistency then? In this function only snprintf
is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@yosefe do you think we still need to be able to print an info string from
|
src/ucp/wireup/select.c
Outdated
@@ -592,7 +592,7 @@ static UCS_F_NOINLINE ucs_status_t ucp_wireup_select_transport( | |||
} | |||
|
|||
is_reachable = 0; | |||
|
|||
ucs_snprintf_safe(info_str, info_str_size, "unreachable"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR we currently enable info_str
to be NULL and its value is checked inside uct_iface_fill_info_str_buf
.
I actually don't remember why we didn't pass an info str buffer to all ucp_wireup_add_X_lanes
functions, but maybe now it can be removed (as I mentioned in the general comment) and info_str
can be defined inside ucp_wireup_select_transport
.. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed info_str from parameters and moved it inside select_transport
Also fix RC info message for a corrupted address
Removed it from select.c:2513
We still need to reset uct_info string to "not available" in UCP before going over remote addresses, since ucp_wireup_is_reachable() can return 0 also if tl name is different - without calling uct_iface_is_reachable_v2. Added a gtest that UCT is_reachable always fills the info string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test errors seem to be relevant
Before:
[1740477778.122527] [swx-ucx-ibr-03:1950035:0] select.c:641 UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable, rc_verbs/mlx5_0:1 - Destination is unreachable
After:
[1740487396.304113] [swx-ucx-ibr-03:1951997:0] select.c:642 UCX ERROR no active messages transport to swx-ucx-ibr-01:3618365: self/memory - unreachable, rc_verbs/mlx5_0:1 - different subnet prefix 0xfec0000000000002/0xfec0000000000001 and FLID is not available