-
Notifications
You must be signed in to change notification settings - Fork 936
Openib updates #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openib updates #332
Conversation
|
I should also mention that this also fixes a long outstanding bug that occurs when the openib btl attempts to coalesce fragments but runs out of descriptors. Looking for the bug id now. |
|
Refer to this link for build results (access rights to CI server needed): Build Log Test FAILed. |
|
Hmm, looks like I missed something pulling this out of my btl branch. Please ignore until I get this fixed. |
There was a bug in the openib btl handling this valid sequence of calls: desc = btl_alloc (); btl_free (desc); When triggered the bug would cause either fragment loss or undefined behavior (SEGV, etc). The problem occured because btl_alloc contained the logic to modify the pending fragment (length, etc) and these changes were not corrected if the fragment was freed instead of sent. To fix this issue I 1) moved some of the coalescing logic to the btl_send function, and 2) retry the coalesced fragment on btl_free if it was never sent. This appears to completely address the issue.
…n place (was duplicated in the send and sendi paths)
|
Should be good now. |
|
Refer to this link for build results (access rights to CI server needed): |
|
@miked-mellanox Please let me know what you think of the changes. |
|
@jladd-mlnx Does Mellanox have an opinion on this pull request? If not I will go ahead and merge it. |
|
@hjelmn I was only waiting to see what the issue was that @abouteiller was reporting regarding message coalescing. Seems like your patch does indeed resolve his issue. I've reviewed you code and it looks fine. With regard to re-enabling message coalescing in 1.8.x; I think that we are of the opinion that we really don't care one way or the other. Maybe @hppritcha or @miked-mellanox have a burning desire to see it re-enabled, but @bureddy and myself do not. |
|
I vote against re-enabling message coalescing in 1.8.X. |
|
Regardless of the "vote", I'm not going to re-enable it in the 1.8 series as it is of little (if any) value, and caused trouble. |
|
Nathan, remember that I set the coalescing MCA param to default to “false” yesterday. If you feel coalescing should be on by default, do not forget to revert that change. Have fun, AurelienThe University of Tennessee, Innovative Computing Laboratory
|
Cleanup the amca code paths
From open-mpi#332: Here follows a bug report by **Guido Vranken** via the _Tor bug bounty program_. Please credit Guido accordingly. ## Bug report The DNS code of Libevent contains this rather obvious OOB read: ```c static char * search_make_new(const struct search_state *const state, int n, const char *const base_name) { const size_t base_len = strlen(base_name); const char need_to_append_dot = base_name[base_len - 1] == '.' ? 0 : 1; ``` If the length of ```base_name``` is 0, then line 3125 reads 1 byte before the buffer. This will trigger a crash on ASAN-protected builds. To reproduce: Build libevent with ASAN: ``` $ CFLAGS='-fomit-frame-pointer -fsanitize=address' ./configure && make -j4 ``` Put the attached ```resolv.conf``` and ```poc.c``` in the source directory and then do: ``` $ gcc -fsanitize=address -fomit-frame-pointer poc.c .libs/libevent.a $ ./a.out ================================================================= ==22201== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60060000efdf at pc 0x4429da bp 0x7ffe1ed47300 sp 0x7ffe1ed472f8 READ of size 1 at 0x60060000efdf thread T0 ``` P.S. we can add a check earlier, but since this is very uncommon, I didn't add it. Fixes: open-mpi#332
From open-mpi#332: Here follows a bug report by **Guido Vranken** via the _Tor bug bounty program_. Please credit Guido accordingly. ## Bug report The DNS code of Libevent contains this rather obvious OOB read: ```c static char * search_make_new(const struct search_state *const state, int n, const char *const base_name) { const size_t base_len = strlen(base_name); const char need_to_append_dot = base_name[base_len - 1] == '.' ? 0 : 1; ``` If the length of ```base_name``` is 0, then line 3125 reads 1 byte before the buffer. This will trigger a crash on ASAN-protected builds. To reproduce: Build libevent with ASAN: ``` $ CFLAGS='-fomit-frame-pointer -fsanitize=address' ./configure && make -j4 ``` Put the attached ```resolv.conf``` and ```poc.c``` in the source directory and then do: ``` $ gcc -fsanitize=address -fomit-frame-pointer poc.c .libs/libevent.a $ ./a.out ================================================================= ==22201== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60060000efdf at pc 0x4429da bp 0x7ffe1ed47300 sp 0x7ffe1ed472f8 READ of size 1 at 0x60060000efdf thread T0 ``` P.S. we can add a check earlier, but since this is very uncommon, I didn't add it. Fixes: open-mpi#332 Signed-off-by: Brendan Cunningham <[email protected]>
Fix a couple of outstanding bugs and cleanup send/sendi code.
@miked-mellanox Please take a look.