Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction: illegal operand when compiling with -fsanitize=address (openmpi 5.0.5) #12819

Closed
muralidhar-nalabothula opened this issue Sep 23, 2024 · 11 comments

Comments

@muralidhar-nalabothula
Copy link

muralidhar-nalabothula commented Sep 23, 2024

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v 5.0.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Operating system distribution package (pacman -S openmpi)

Please describe the system on which you are running

  • Operating system/version: EndeavourOS VERSION=2024.06.25
  • Computer hardware: x86_64 (Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz)

Details of the problem

When using address sanitizer, I get Illegal instruction. (same with both gcc and clang). Please note that I donot see this error when using 4.1.6 version. This issue was mentioned previously: #12584 (comment)

#include <mpi.h>
int main(int argc, char* argv[])
{
    MPI_Init(NULL,NULL);
    MPI_Finalize();
    return 0; 
}
mpicc -g -fsanitize=address sample_prog.c
 ./a.out
 # output:
 Caught signal 4 (Illegal instruction: illegal operand)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==2532896==ERROR: AddressSanitizer: SEGV on unknown address 0x7368917f1000 (pc 0x7368917f1000 bp 0x736892049050 sp 0x736892049020 T0)
==2532896==The signal is caused by a READ memory access.
==2532896==Hint: PC is at a non-executable region. Maybe a wild jump?
    #0 0x7368917f1000  (<unknown module>)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>) 
==2532896==ABORTING

'''gdb output```

[New Thread 0x7ffff36006c0 (LWP 2533116)]

Thread 1 "a.out" received signal SIGILL, Illegal instruction.
0x00007ffff78526b7 in mprotect () from /usr/lib/libasan.so.8
@rhc54
Copy link
Contributor

rhc54 commented Sep 23, 2024

I don't see anything usable in that error output - an "illegal instruction somewhere in the code" is impossible to track down. Any chance you can localize that a bit?

@bosilca
Copy link
Member

bosilca commented Sep 23, 2024

The culprit seems to be mprotect () from /usr/lib/libasan.so.8, that's not something this community we can fix.

@muralidhar-nalabothula
Copy link
Author

muralidhar-nalabothula commented Sep 25, 2024

@rhc54 I manually compiled openmpi.5.0.5 with the following options:

./configure --prefix=/home/murali/softwares/core CFLAGS=-fsanitize=address LDFLAGS=-fsanitize=address --enable-debug 

Here is the backtrace:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==390121==ERROR: AddressSanitizer: SEGV on unknown address 0x03e80005f3e9 (pc 0x71dfb1416606 bp 0x7ffe50181fc0 sp 0x7ffe50181fb0 T0)
==390121==The signal is caused by a READ memory access.
==== backtrace (tid: 390120) ====
 0 0x000000000004d212 ucs_event_set_fd_get()  ???:0
 1 0x000000000004d3dd ucs_event_set_fd_get()  ???:0
 2 0x000000000003d1d0 __sigaction()  ???:0
 3 0x0000000000216606 opal_net_get_hostname()  /home/murali/softwares/core/builds/openmpi-5.0.5/opal/util/net.c:409
 4 0x00000000000013af get_weights()  reachable_netlink_module.c:0
 5 0x0000000000001575 netlink_reachable()  reachable_netlink_module.c:0
 6 0x000000000029d103 mca_btl_tcp_proc_create_interface_graph()  /home/murali/softwares/core/builds/openmpi-5.0.5/opal/mca/btl/tcp/btl_tcp_proc.c:213
 7 0x000000000029df2e mca_btl_tcp_proc_handle_modex_addresses()  /home/murali/softwares/core/builds/openmpi-5.0.5/opal/mca/btl/tcp/btl_tcp_proc.c:337
 8 0x000000000029f1c7 mca_btl_tcp_proc_create()  /home/murali/softwares/core/builds/openmpi-5.0.5/opal/mca/btl/tcp/btl_tcp_proc.c:430
 9 0x000000000027d6dd mca_btl_tcp_add_procs()  /home/murali/softwares/core/builds/openmpi-5.0.5/opal/mca/btl/tcp/btl_tcp.c:100
10 0x00000000004406d0 mca_bml_r2_add_procs()  /home/murali/softwares/core/builds/openmpi-5.0.5/ompi/mca/bml/r2/bml_r2.c:526
11 0x0000000000a138a1 mca_pml_ob1_add_procs()  /home/murali/softwares/core/builds/openmpi-5.0.5/ompi/mca/pml/ob1/pml_ob1.c:403
12 0x00000000002aacfe ompi_mpi_instance_init_common()  /home/murali/softwares/core/builds/openmpi-5.0.5/ompi/instance/instance.c:701
13 0x00000000002ab9a1 ompi_mpi_instance_init()  /home/murali/softwares/core/builds/openmpi-5.0.5/ompi/instance/instance.c:824
14 0x000000000028b24e ompi_mpi_init()  /home/murali/softwares/core/builds/openmpi-5.0.5/ompi/runtime/ompi_mpi_init.c:359
15 0x000000000036a2c9 PMPI_Init()  /home/murali/softwares/core/builds/openmpi-5.0.5/ompi/mpi/c/init.c:69
16 0x0000000000001187 main()  ???:0
17 0x0000000000025e08 __libc_init_first()  ???:0
18 0x0000000000025ecc __libc_start_main()  ???:0
19 0x0000000000001095 _start()  ???:0

@rhc54
Copy link
Contributor

rhc54 commented Sep 25, 2024

I take it you run fine when not including the address sanitizer? Last time I tried playing with that, it took me down a rabbit hole.

@muralidhar-nalabothula
Copy link
Author

Yes, works perfectly without sanitizer. Irony is that 4.1.6 works perfectly fine with sanitizer.

@ggouaillardet
Copy link
Contributor

Which compiler are you using? have you tried configuring with --disable-dlopen?

@ggouaillardet
Copy link
Contributor

on my system, a workaround is to

env OMPI_MCA_memory=^patcher ./a.out

@muralidhar-nalabothula
Copy link
Author

muralidhar-nalabothula commented Sep 27, 2024

@ggouaillardet

Thanks a lot. I was using both clang and gcc previously, but env OMPI_MCA_memory=^patcher ./a.out works perfectly. Not sure what it does.

@rhc54
Copy link
Contributor

rhc54 commented Sep 27, 2024

I believe the issue stems from the sanitizer acting as a memory interceptor (so it can perform its address checks) while OMPI has its own memory interceptor (the "patcher") that it uses for optimizing performance - and the two conflict. So basically the MCA parameter is saying "turn off the OMPI interceptor" and ceding control of memory to the sanitizer, thus resolving the conflict.

Of course, that begs the question of whether you are checking OMPI's actual memory usage - i.e., is this test doing anything truly useful? 🤷‍♂️

@ggouaillardet
Copy link
Contributor

based on the traces, I would suspect the sanitizer does not properly support mprotect(), that is used by the memory/patcher component. OMPI_MCA_memory=^patcher disables that component.

@muralidhar-nalabothula
Copy link
Author

Of course, that begs the question of whether you are checking OMPI's actual memory usage - i.e., is this test doing anything truly useful?

I am not at all concerned with openmpi memory usage. I wanted to use address sanitizer to debug my own code and was not able to use it.

@muralidhar-nalabothula muralidhar-nalabothula closed this as not planned Won't fix, can't repro, duplicate, stale Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants