-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic/topo numa distance part2 #8853
Topic/topo numa distance part2 #8853
Conversation
b77fd85
to
42ad98c
Compare
42ad98c
to
e118212
Compare
pls don't review until first part is merged. |
Please look for |
8e84160
to
b60fb6d
Compare
|
b60fb6d
to
5a6feea
Compare
print_row_separator(distance_width, name_width, num_devices, ' ', '|'); | ||
print_row_separator(distance_width, name_width, num_devices, '-', '+'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give example of the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pls fix units/dev name?
see #8853 (comment)
src/tools/info/sys_info.c
Outdated
unsigned num_devices = ucs_topo_num_devices(); | ||
static const int distance_width = 10; | ||
const char *distance_unit = "MB/s"; | ||
unsigned num_devices = ucs_topo_num_devices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
?
print_row_separator(distance_width, name_width, num_devices, ' ', '|'); | ||
print_row_separator(distance_width, name_width, num_devices, '-', '+'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pls fix units/dev name?
see #8853 (comment)
src/ucp/core/ucp_worker.c
Outdated
ucs_sys_device_t sys_dev = ucp_worker_get_sys_device(wiface); | ||
ucs_status_t status; | ||
|
||
status = ucs_topo_get_memory_distance(sys_dev, distance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe ucs_topo_get_memory_distance should return void?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can take it even further, maybe we need to add a constrain to the topo providers API definition to have fallback behavior?
This way both get_distance and get_memory_distance will return void...
But maybe in another PR?
I can do it for the memory_distance for now and add it to the API description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do it for memory distance for now
src/ucp/core/ucp_worker.c
Outdated
ucp_worker_iface_add_distance(&wiface->attr, &distance); | ||
} | ||
ucp_worker_get_sys_device_memory_distance(wiface); | ||
ucp_worker_iface_add_distance(&wiface->attr, &wiface->memory_distance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to save memory_distance on the wiface?
seems it's used only during initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will reduce the calls to ucs_topo_get_memory_distance
in ucp_worker_iface_estimate_perf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but ucp_worker_iface_estimate_perf still calls UCT estimate perf
and ucs_topo_get_memory_distance should be quite fast since the NUMA distances are saved in a hash in ucs/topo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. Will revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gleon99 can you pls take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ofirfarjun7 please squash.
08ba262
to
91c1b25
Compare
What
Why ?
libnuma
dependencyHow ?
libnuma
dependent code.libnuma
from UCX