Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/util: add fallback logic for MR caching selection #12

Closed
wants to merge 5 commits into from

Conversation

miharulidze
Copy link

This commit does the following changes:
- Introduces new option 'disabled' for FI_MR_CACHE_MONITOR to
disable memory registration caching for testing/debug/stability
purposes.
- Adds the fallback cache monitor selection logic. The available
monitor will be chosen with the following priority:
1) userfaultfd monitor will be selected if machine supports
userfaultfd functionality.
2) memhooks will be selected if machine doesn't support userfault
functionality and has elf.h, sys/auxv.h headers
3) MR caching will be disabled --- ofi_monitor_add_cache() will
return -FI_ENOSYS code.

Signed-off-by: Mikhail Khalilov [email protected]

shefty and others added 5 commits July 26, 2019 07:26
Integrate hooking memory allocation related calls into the MR
cache.  This allows us to support older linux kernel that do not
support userfaultfd.

The core of the code is adapted from OpenMPI.

Signed-off-by: Nikita Gusev <[email protected]>
Signed-off-by: Sean Hefty <[email protected]>
This commit fixes:
   - removes unnecessary fastlock acquiring/release inside of
     ofi_intercept_symbol()/ofi_restore_intercepts() functions.
     We cannot acquire monitor lock inside of this functions
     since lock is already acquired in
     ofi_monitor_add_cache()/ofi_monitor_del_cache() routines;
   - adds missing dlist_init() for dl_intercept_list;
   - fixes dlist_insert() segfault in ofi_intercept_symbol();
   - adds missing logic for FI_MR_CACHE_MONITOR supporting "userfaultfd"
     and "memhooks monitor" with fallback to default monitor.
   - adds some missing cleanup logic

Signed-off-by: Mikhail Khalilov <[email protected]>
core/util: Additional fixes for mr cache hooking mechanism
This commit does the following changes:
    - Introduces new option 'disabled' for FI_MR_CACHE_MONITOR to
      disable memory registration caching for testing/debug/stability
      purposes.
    - Adds the fallback cache monitor selection logic. The available
      monitor will be chosen with the following priority:
      1) userfaultfd monitor will be selected if machine supports
         userfaultfd functionality.
      2) memhooks will be selected if machine doesn't support userfault
         functionality and has elf.h, sys/auxv.h headers
      3) MR caching will be disabled --- ofi_monitor_add_cache() will
         return -FI_ENOSYS code.

Signed-off-by: Mikhail Khalilov <[email protected]>
@@ -39,6 +39,7 @@ struct ofi_mem_monitor *uffd_monitor = &uffd.monitor;

struct ofi_mem_monitor *default_monitor;

struct ofi_mem_monitor *dummy_monitor = NULL; /* A stub to disable MR caching */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this and just use NULL to mean disabled.

@shefty shefty force-pushed the merging branch 4 times, most recently from c4634c0 to 83762a7 Compare September 13, 2019 23:19
@shefty shefty closed this Oct 16, 2019
shefty pushed a commit that referenced this pull request Dec 18, 2020
ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4c61e7e0 at pc 0x14f2cb7ae0b9 bp 0x7fff4c61e650 sp 0x7fff4c61ddd8
WRITE of size 17 at 0x7fff4c61e7e0 thread T0
    #0 0x14f2cb7ae0b8  (/lib64/libasan.so.5+0xb40b8)
    #1 0x14f2cb7aedd2 in vsscanf (/lib64/libasan.so.5+0xb4dd2)
    #2 0x14f2cb7aeede in __interceptor_sscanf (/lib64/libasan.so.5+0xb4ede)
    #3 0x14f2cb230766 in ofi_addr_format src/common.c:401
    #4 0x14f2cb233238 in ofi_str_toaddr src/common.c:780
    #5 0x14f2cb314332 in vrb_handle_ib_ud_addr prov/verbs/src/verbs_info.c:1670
    #6 0x14f2cb314332 in vrb_get_match_infos prov/verbs/src/verbs_info.c:1787
    #7 0x14f2cb314332 in vrb_getinfo prov/verbs/src/verbs_info.c:1841
    #8 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    #9 0x14f2cb25fcc0 in ofi_get_core_info prov/util/src/util_attr.c:298
    #10 0x14f2cb269b20 in ofix_getinfo prov/util/src/util_attr.c:321
    #11 0x14f2cb3e29fd in rxd_getinfo prov/rxd/src/rxd_init.c:122
    #12 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    #13 0x407150 in ft_getinfo common/shared.c:794
    #14 0x414917 in ft_init_fabric common/shared.c:1042
    #15 0x402f40 in run functional/bw.c:155
    #16 0x402f40 in main functional/bw.c:252
    #17 0x14f2ca1b28e2 in __libc_start_main (/lib64/libc.so.6+0x238e2)
    #18 0x401d1d in _start (/root/libfabric/fabtests/functional/fi_bw+0x401d1d)

Address 0x7fff4c61e7e0 is located in stack of thread T0 at offset 48 in frame
    #0 0x14f2cb2306f3 in ofi_addr_format src/common.c:397

  This frame has 1 object(s):
    [32, 48) 'fmt' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb40b8)
Shadow bytes around the buggy address:
  0x1000698bbca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000698bbcf0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00[f2]f2 f3 f3
  0x1000698bbd00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x1000698bbd10: f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd20: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd30: f2 f2 00 00 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00
  0x1000698bbd40: 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

Fixes: 5d31276 ("common: Redo address string conversions")
Signed-off-by: Honggang Li <[email protected]>
shefty pushed a commit that referenced this pull request Dec 18, 2020
ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4c61e7e0 at pc 0x14f2cb7ae0b9 bp 0x7fff4c61e650 sp 0x7fff4c61ddd8
WRITE of size 17 at 0x7fff4c61e7e0 thread T0
    #0 0x14f2cb7ae0b8  (/lib64/libasan.so.5+0xb40b8)
    #1 0x14f2cb7aedd2 in vsscanf (/lib64/libasan.so.5+0xb4dd2)
    #2 0x14f2cb7aeede in __interceptor_sscanf (/lib64/libasan.so.5+0xb4ede)
    #3 0x14f2cb230766 in ofi_addr_format src/common.c:401
    #4 0x14f2cb233238 in ofi_str_toaddr src/common.c:780
    #5 0x14f2cb314332 in vrb_handle_ib_ud_addr prov/verbs/src/verbs_info.c:1670
    #6 0x14f2cb314332 in vrb_get_match_infos prov/verbs/src/verbs_info.c:1787
    #7 0x14f2cb314332 in vrb_getinfo prov/verbs/src/verbs_info.c:1841
    #8 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    #9 0x14f2cb25fcc0 in ofi_get_core_info prov/util/src/util_attr.c:298
    #10 0x14f2cb269b20 in ofix_getinfo prov/util/src/util_attr.c:321
    #11 0x14f2cb3e29fd in rxd_getinfo prov/rxd/src/rxd_init.c:122
    #12 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    #13 0x407150 in ft_getinfo common/shared.c:794
    #14 0x414917 in ft_init_fabric common/shared.c:1042
    #15 0x402f40 in run functional/bw.c:155
    #16 0x402f40 in main functional/bw.c:252
    #17 0x14f2ca1b28e2 in __libc_start_main (/lib64/libc.so.6+0x238e2)
    #18 0x401d1d in _start (/root/libfabric/fabtests/functional/fi_bw+0x401d1d)

Address 0x7fff4c61e7e0 is located in stack of thread T0 at offset 48 in frame
    #0 0x14f2cb2306f3 in ofi_addr_format src/common.c:397

  This frame has 1 object(s):
    [32, 48) 'fmt' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb40b8)
Shadow bytes around the buggy address:
  0x1000698bbca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000698bbcf0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00[f2]f2 f3 f3
  0x1000698bbd00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x1000698bbd10: f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd20: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd30: f2 f2 00 00 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00
  0x1000698bbd40: 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

Fixes: 5d31276 ("common: Redo address string conversions")
Signed-off-by: Honggang Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants