Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBLAS makes GIMP crash #132

Closed
svillemot opened this issue Aug 5, 2012 · 6 comments
Closed

OpenBLAS makes GIMP crash #132

svillemot opened this issue Aug 5, 2012 · 6 comments
Labels
Milestone

Comments

@svillemot
Copy link
Contributor

This is Debian bug #673061 (http://bugs.debian.org/673061)

When OpenBLAS is installed on a Debian system, GIMP (the GNU Image Manipulation
Program) crashes at launch time.

The crash occurs when GIMP tries to load a module from GEGL (Generic Graphics
Library), matting-levin.so, which itself links against BLAS.

A full backtrace is below. The crash occurs in rpcc() from common_x86_x64.h.

The crash has been replicated with OpenBLAS 0.1.1 and 0.2.2.

Note that the crash disappears if OpenBLAS is compiled without multithreading
support (i.e. with NUM_THREADS=1).

Please let me know if you need more information or more actions on my part in
order to debug this.

Thanks,

Starting program: /usr/bin/gimp 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffeb763700 (LWP 3647)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeb763700 (LWP 3647)]
0x00007fffec45da81 in rpcc () at ../../common_x86_64.h:76
76    __asm__ __volatile__ ("rdtsc" : "=a" (a), "=d" (d));
(gdb) thread apply all bt full

Thread 2 (Thread 0x7fffeb763700 (LWP 3647)):
#0  0x00007fffec45da81 in rpcc () at ../../common_x86_64.h:76
        a = 1946898218
        d = 1703985
#1  0x00007fffec45df6f in blas_thread_server (arg=0x0) at blas_server.c:296
        cpu = 0
        last_tick = 1940744908
        buffer = 0x7fffe1611000
        sa = 0x7fffe1611020
        sb = 0x7fffe170d020
        queue = 0x7fffffffd938
#2  0x00007ffff3ca4b50 in start_thread (arg=) at pthread_create.c:304
        __res = 
        pd = 0x7fffeb763700
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737143781120, -5342048371241765679, 140737283551648, 140737143781824, 140737354125376, 7, 5342093392661591249, 5342074944878687441}, mask_was_saved = 0}}, 
          priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 
        freesize = 
        __PRETTY_FUNCTION__ = "start_thread"
#3  0x00007ffff39ef70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
No locals.
#4  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 0x7ffff7fc9920 (LWP 3644)):
#0  0x00007ffff7df3447 in ?? () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#1  0x00007ffff7df2833 in ?? () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#2  0x00007ffff7df02a2 in ?? () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#3  0x00007ffff7df09ce in ?? () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#4  0x00007ffff7deabd6 in ?? () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#5  0x00007ffff080f2ec in _dlerror_run (operate=0x7ffff080efe0 , args=0xc9d7b0) at dlerror.c:164
        result = 0xc36540
#6  0x00007ffff080f00f in __dlclose (handle=) at dlclose.c:48
No locals.
#7  0x00007ffff29a35e2 in _g_module_close (handle=, is_unref=) at /tmp/buildd/glib2.0-2.32.3/./gmodule/gmodule-dl.c:134
No locals.
#8  g_module_close (module=0xcba830) at /tmp/buildd/glib2.0-2.32.3/./gmodule/gmodule.c:770
        last = 
        node = 
        __PRETTY_FUNCTION__ = "g_module_close"
#9  0x00007ffff4e4e63d in gegl_module_close (module=0xc9bc10) at geglmodule.c:406
No locals.
#10 0x00007ffff4e4eeda in gegl_module_new (filename=filename@entry=0xc76570 "/usr/lib/x86_64-linux-gnu/gegl-0.2/matting-levin.so", load_inhibit=0, verbose=0) at geglmodule.c:227
        module = 0xc9bc10
        __PRETTY_FUNCTION__ = "gegl_module_new"
#11 0x00007ffff4e4f48f in gegl_module_db_module_initialize (file_data=, user_data=) at geglmoduledb.c:343
        db = 0xc36d50
        module = 
        load_inhibit = 2131616
#12 0x00007ffff4e4e4e9 in gegl_datafiles_read_directories (path_str=, flags=G_FILE_TEST_EXISTS, loader_func=0x7ffff4e4f3e0 , user_data=0xc36d50)
    at gegldatafiles.c:214
        dirname = 0xc73150 "/usr/lib/x86_64-linux-gnu/gegl-0.2"
        file_data = {filename = 0xc76570 "/usr/lib/x86_64-linux-gnu/gegl-0.2/matting-levin.so", dirname = 0xc73150 "/usr/lib/x86_64-linux-gnu/gegl-0.2", basename = 0xc909e3 "matting-levin.so", 
          atime = 1344158283, mtime = 1336898848, ctime = 1344158281}
        filestat = {st_dev = 65026, st_ino = 33432, st_nlink = 1, st_mode = 33188, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 35152, st_blksize = 4096, st_blocks = 72, st_atim = {
            tv_sec = 1344158283, tv_nsec = 692586660}, st_mtim = {tv_sec = 1336898848, tv_nsec = 0}, st_ctim = {tv_sec = 1344158281, tv_nsec = 460628148}, __unused = {0, 0, 0}}
        local_path = 0xc730f0 "/usr/lib/x86_64-linux-gnu/gegl-0.2"
        path = 0xc29920
        list = 0xc29920
        filename = 0xc76570 "/usr/lib/x86_64-linux-gnu/gegl-0.2/matting-levin.so"
        err = 0
        dir = 0xc8f4f0
        dir_ent = 0xc909e3 "matting-levin.so"
        __PRETTY_FUNCTION__ = "gegl_datafiles_read_directories"
#13 0x00007ffff4e2ce7c in gegl_post_parse_hook (context=, group=, data=, error=) at gegl-init.c:551
        gegl_path = 0xc730a0 "/usr/lib/x86_64-linux-gnu/gegl-0.2"
#14 gegl_post_parse_hook (context=0x7fffebcaf000, group=0x2086a0, data=0x2, error=0xffffffffffffffff) at gegl-init.c:464
No locals.
#15 0x00007ffff3f0f458 in g_option_context_parse (context=context@entry=0xc2b610, argc=argc@entry=0x7fffffffe46c, argv=argv@entry=0x7fffffffe460, error=error@entry=0x7fffffffe478)
    at /tmp/buildd/glib2.0-2.32.3/./glib/goption.c:1995
        group = 
        i = 1
        j = 
        k = 
        list = 0xc29820
#16 0x000000000048c232 in main (argc=1, argv=0x7fffffffe588) at /build/buildd-gimp_2.8.0-2+b1-amd64-Ohu6F0/gimp-2.8.0/./app/main.c:396
        context = 0xc2b610
        error = 0x0
        abort_message = 
        basename = 
        i = 
@xianyi
Copy link
Collaborator

xianyi commented Aug 7, 2012

Hi ,

What's the compiler version? Could you test the bug with USE_OPENMP=1 ?

The rpcc is very simple. So far, I don't have any idea about this issue.

Thanks

Xianyi

@svillemot
Copy link
Contributor Author

I use gcc 4.7.1.

If I compile with USE_OPENMP=1, I still get a crash, but this time it is in the OMP library (libgomp.so) when OpenBLAS is loaded. I can provide a full backtrace if needed.

Concerning the rpcc() function: it contains only a "rdtsc" x86 instruction. I noticed that this latter instruction can be disabled under Linux, in which case calling it generates a segfault (see http://www.kernel.org/doc/man-pages/online/pages/man2/prctl.2.html and look for PR_SET_TSC). That might be the cause of the problem. I tried to test that hypothesis but got no success (I am not very familiar with low level stuff). Any idea of how to test that?

@xianyi
Copy link
Collaborator

xianyi commented Aug 9, 2012

Hi,

According to TSC wiki page http://en.wikipedia.org/wiki/Time_Stamp_Counter,
"Under *nix, similar functionality is provided by reading the value of CLOCK_MONOTONIC clock using POSIX clock_gettime function."

Is it possible to replace rdtsc with clock_gettime()?

Xianyi

@svillemot
Copy link
Contributor Author

I have created a minimal example which crashes on my system. The code is at: http://www.dynare.org/sebastien/issue132.tar.gz

The setup is similar to that of GIMP: the "main" program calls a dynamic module (foo.so) which itself calls OpenBLAS. The program crashes in dlclose() on my system. This happens with the standard Debian package (which is compiled with DYNAMIC_ARCH=1 TARGET=GENERIC), but the crash disappears if I add USE_OPENMP=1 (I did not test with multithreading disabled).

I am not sure that this crash is the same than the one with GIMP, but it looks similar.

Now I am going to test your suggestion about clock_gettime.

@xianyi
Copy link
Collaborator

xianyi commented Aug 11, 2012

Hi @sebastien-villemot ,

I fixed the bug about your test code. Now, it can exit successfully.

Could you test it with GIMP crash?

Thanks

Xianyi

@svillemot
Copy link
Contributor Author

I applied a55821a to OpenBLAS 0.1.1 (the version I'm interested in fixing for Debian) and it fixes the GIMP crash!

Thanks for your prompt reaction. I'm going to upload the patch to Debian.

kseniyazaytseva pushed a commit to kseniyazaytseva/OpenBLAS that referenced this issue Dec 18, 2023
…tests

Merge in PL/openblas from dev/k.zaytseva/LM-268 to dev-riscv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants