Implement a portable fix for SIGCHLD crashes #18

JohnoKing · 2020-06-16T13:27:18Z

As previously reported in rhbz#1112306, ksh may crash when receiving SIGCHLD because GCC's optimizer will fail to generate addl and sub instructions to increment and decrement job.in_critical in the job_subsave function. This bug does occur in GCC 10 with -O2, but not -O1; it doesn't appear this bug has been fixed. As a reference, here is the relevant debug assembly output of job_subsave when KSH is compiled with CCFLAGS set to -g -O1:

0000000000034c97 <job_subsave>:

void *job_subsave(void)
{
   34c97:       53                      push   %rbx
        struct back_save *bp = new_of(struct back_save,0);
   34c98:       bf 18 00 00 00          mov    $0x18,%edi
   34c9d:       e8 34 4a 0a 00          callq  d96d6 <_ast_malloc>
   34ca2:       48 89 c3                mov    %rax,%rbx
        job_lock();
   34ca5:       83 05 3c 50 13 00 01    addl   $0x1,0x13503c(%rip)        # 169ce8 <job+0x28>
        *bp = bck;
   34cac:       66 0f 6f 05 4c 5a 13    movdqa 0x135a4c(%rip),%xmm0        # 16a700 <bck>
   34cb3:       00
   34cb4:       0f 11 00                movups %xmm0,(%rax)
   34cb7:       48 8b 05 52 5a 13 00    mov    0x135a52(%rip),%rax        # 16a710 <bck+0x10>
   34cbe:       48 89 43 10             mov    %rax,0x10(%rbx)
        bp->prev = bck.prev;
   34cc2:       48 8b 05 47 5a 13 00    mov    0x135a47(%rip),%rax        # 16a710 <bck+0x10>
   34cc9:       48 89 43 10             mov    %rax,0x10(%rbx)
        bck.count = 0;
   34ccd:       c7 05 29 5a 13 00 00    movl   $0x0,0x135a29(%rip)        # 16a700 <bck>
   34cd4:       00 00 00
        bck.list = 0;
   34cd7:       48 c7 05 26 5a 13 00    movq   $0x0,0x135a26(%rip)        # 16a708 <bck+0x8>
   34cde:       00 00 00 00
        bck.prev = bp;
   34ce2:       48 89 1d 27 5a 13 00    mov    %rbx,0x135a27(%rip)        # 16a710 <bck+0x10>
        job_unlock();
   34ce9:       8b 05 f9 4f 13 00       mov    0x134ff9(%rip),%eax        # 169ce8 <job+0x28>
   34cef:       83 e8 01                sub    $0x1,%eax
   34cf2:       89 05 f0 4f 13 00       mov    %eax,0x134ff0(%rip)        # 169ce8 <job+0x28>
   34cf8:       75 2b                   jne    34d25 <job_subsave+0x8e>
   34cfa:       8b 3d ec 4f 13 00       mov    0x134fec(%rip),%edi        # 169cec <job+0x2c>
   34d00:       85 ff                   test   %edi,%edi
   34d02:       74 21                   je     34d25 <job_subsave+0x8e>
   34d04:       c7 05 da 4f 13 00 01    movl   $0x1,0x134fda(%rip)        # 169ce8 <job+0x28>

When -O2 is used instead of -O1, the addl and sub instructions for incrementing and decrementing the lock are removed. GCC instead generates a broken mov instruction for job_lock and removes the initial sub instruction in job_unlock (this is also seen in Red Hat's bug report):

       job_lock();
       *bp = bck;
  37d7c:       66 0f 6f 05 7c 79 14    movdqa 0x14797c(%rip),%xmm0        # 17f700 <bck>
  37d83:       00
       struct back_save *bp = new_of(struct back_save,0);
  37d84:       49 89 c4                mov    %rax,%r12
       job_lock();
  37d87:       8b 05 5b 6f 14 00       mov    0x146f5b(%rip),%eax        # 17ece8 <job+0x28>
...
        job_unlock();
  37dc6:       89 05 1c 6f 14 00       mov    %eax,0x146f1c(%rip)        # 17ece8 <job+0x28>
  37dcc:       85 c0                   test   %eax,%eax
  37dce:       75 2b                   jne    37dfb <job_subsave+0x8b>

The original patch in #14 works around this bug by using the legacy __sync_fetch_and_add/sub GCC builtins. This forces GCC to generate instructions that change the lock with lock addl, lock xadd and lock subl:

       job_lock();
  37d9f:       f0 83 05 41 6f 14 00    lock addl $0x1,0x146f41(%rip)        # 17ece8 <job+0x28>
  37da6:       01
...
       job_unlock();
  37deb:       f0 0f c1 05 f5 6e 14    lock xadd %eax,0x146ef5(%rip)        # 17ece8 <job+0x28>
  37df2:       00
  37df3:       83 f8 01                cmp    $0x1,%eax
  37df6:       74 08                   je     37e00 <job_subsave+0x70>
...
  37e25:       74 11                   je     37e38 <job_subsave+0xa8>
  37e27:       f0 83 2d b9 6e 14 00    lock subl $0x1,0x146eb9(%rip)        # 17ece8 <job+0x28>

While this does work, it isn't portable. This patch implements a different workaround for
this compiler bug. If job_lock is put at the beginning of job_subsave, GCC will generate
the required addl and sub instructions:

       job_lock();
  37d67:       83 05 7a 5f 14 00 01    addl   $0x1,0x145f7a(%rip)        # 17dce8 <job+0x28>
...
        job_unlock();
  37dbb:       83 e8 01                sub    $0x1,%eax
  37dbe:       89 05 24 5f 14 00       mov    %eax,0x145f24(%rip)        # 17dce8 <job+0x28>

It is odd that moving a single line of code fixes this problem, although GCC should have generated these instructions from the original code anyway. I'll note that this isn't the only way to get these instructions to generate. The problem also seems to go away when inserting almost anything else inside of the code for job_subsave. This is just a simple workaround for a strange compiler bug.

Note: Getting the debug assembly output isn't very difficult. It can be done with the following set of commands for optimization level -O2:

$ rm -rf ./arch
$ bin/package make ast-ksh CCFLAGS='-O2 -g' 
$ objdump -S arch/*/bin/ksh | less

McDutchie · 2020-06-16T14:05:31Z

Thanks, @JohnoKing, for confirming that this is indeed a gcc compiler bug. And a bad one.

Attn: @aweeraman – what do you think about this?

Maybe Debian needs to hear about this much simpler workaround.

As previously reported in rhbz#1112306 (https://bugzilla.redhat.com/show_bug.cgi?id=1112306), ksh may crash when receiving SIGCHLD because GCC's optimizer will fail to generate `addl` and `sub` instructions to increment and decrement `job.in_critical` in the `job_subsave` function. This bug *does* occur in GCC 10 with `-O2`, but not `-O1`; it doesn't appear this bug has been fixed. As a reference, here is the relevant debug assembly output of `job_subsave` when KSH is compiled with `CCFLAGS` set to `-g -O1`: 0000000000034c97 <job_subsave>: void *job_subsave(void) { 34c97: 53 push %rbx struct back_save *bp = new_of(struct back_save,0); 34c98: bf 18 00 00 00 mov $0x18,%edi 34c9d: e8 34 4a 0a 00 callq d96d6 <_ast_malloc> 34ca2: 48 89 c3 mov %rax,%rbx job_lock(); 34ca5: 83 05 3c 50 13 00 01 addl $0x1,0x13503c(%rip) # 169ce8 <job+0x28> *bp = bck; 34cac: 66 0f 6f 05 4c 5a 13 movdqa 0x135a4c(%rip),%xmm0 # 16a700 <bck> 34cb3: 00 34cb4: 0f 11 00 movups %xmm0,(%rax) 34cb7: 48 8b 05 52 5a 13 00 mov 0x135a52(%rip),%rax # 16a710 <bck+0x10> 34cbe: 48 89 43 10 mov %rax,0x10(%rbx) bp->prev = bck.prev; 34cc2: 48 8b 05 47 5a 13 00 mov 0x135a47(%rip),%rax # 16a710 <bck+0x10> 34cc9: 48 89 43 10 mov %rax,0x10(%rbx) bck.count = 0; 34ccd: c7 05 29 5a 13 00 00 movl $0x0,0x135a29(%rip) # 16a700 <bck> 34cd4: 00 00 00 bck.list = 0; 34cd7: 48 c7 05 26 5a 13 00 movq $0x0,0x135a26(%rip) # 16a708 <bck+0x8> 34cde: 00 00 00 00 bck.prev = bp; 34ce2: 48 89 1d 27 5a 13 00 mov %rbx,0x135a27(%rip) # 16a710 <bck+0x10> job_unlock(); 34ce9: 8b 05 f9 4f 13 00 mov 0x134ff9(%rip),%eax # 169ce8 <job+0x28> 34cef: 83 e8 01 sub $0x1,%eax 34cf2: 89 05 f0 4f 13 00 mov %eax,0x134ff0(%rip) # 169ce8 <job+0x28> 34cf8: 75 2b jne 34d25 <job_subsave+0x8e> 34cfa: 8b 3d ec 4f 13 00 mov 0x134fec(%rip),%edi # 169cec <job+0x2c> 34d00: 85 ff test %edi,%edi 34d02: 74 21 je 34d25 <job_subsave+0x8e> 34d04: c7 05 da 4f 13 00 01 movl $0x1,0x134fda(%rip) # 169ce8 <job+0x28> When `-O2` is used instead of `-O1`, the `addl` and `sub` instructions for incrementing and decrementing the lock are removed. GCC instead generates a broken `mov` instruction for `job_lock` and removes the initial `sub` instruction in job_unlock (this is also seen in Red Hat's bug report): job_lock(); *bp = bck; 37d7c: 66 0f 6f 05 7c 79 14 movdqa 0x14797c(%rip),%xmm0 # 17f700 <bck> 37d83: 00 struct back_save *bp = new_of(struct back_save,0); 37d84: 49 89 c4 mov %rax,%r12 job_lock(); 37d87: 8b 05 5b 6f 14 00 mov 0x146f5b(%rip),%eax # 17ece8 <job+0x28> ... job_unlock(); 37dc6: 89 05 1c 6f 14 00 mov %eax,0x146f1c(%rip) # 17ece8 <job+0x28> 37dcc: 85 c0 test %eax,%eax 37dce: 75 2b jne 37dfb <job_subsave+0x8b> The original patch works around this bug by using the legacy `__sync_fetch_and_add/sub` GCC builtins. This forces GCC to generate instructions that change the lock with `lock addl`, `lock xadd` and `lock subl`: job_lock(); 37d9f: f0 83 05 41 6f 14 00 lock addl $0x1,0x146f41(%rip) # 17ece8 <job+0x28> 37da6: 01 ... job_unlock(); 37deb: f0 0f c1 05 f5 6e 14 lock xadd %eax,0x146ef5(%rip) # 17ece8 <job+0x28> 37df2: 00 37df3: 83 f8 01 cmp $0x1,%eax 37df6: 74 08 je 37e00 <job_subsave+0x70> ... 37e25: 74 11 je 37e38 <job_subsave+0xa8> 37e27: f0 83 2d b9 6e 14 00 lock subl $0x1,0x146eb9(%rip) # 17ece8 <job+0x28> While this does work, it isn't portable. This patch implements a different workaround for this compiler bug. If `job_lock` is put at the beginning of `job_subsave`, GCC will generate the required `addl` and `sub` instructions: job_lock(); 37d67: 83 05 7a 5f 14 00 01 addl $0x1,0x145f7a(%rip) # 17dce8 <job+0x28> ... job_unlock(); 37dbb: 83 e8 01 sub $0x1,%eax 37dbe: 89 05 24 5f 14 00 mov %eax,0x145f24(%rip) # 17dce8 <job+0x28> It is odd that moving a single line of code fixes this problem, although GCC _should_ have generated these instructions from the original code anyway. I'll note that this isn't the only way to get these instructions to generate. The problem also seems to go away when inserting almost anything else inside of the code for `job_subsave`. This is just a simple workaround for a strange compiler bug.

aweeraman · 2020-06-16T20:55:13Z

This is a very nice analysis @JohnoKing and for the elegant fix - much respect!
I take it you were able to recreate the segfault and test this before and after the fix? I'll do some testing as well.

JohnoKing · 2020-06-16T21:24:06Z

I unfortunately could not reproduce the crash as it only occurs rarely, so this fix is based on the assumption that Red Hat's original analysis for why this crash occurs is correct (link, read comments 5 and 6). This patch simply implements an alternate way that forces GCC to generate addl and sub instructions to increment and decrement job.in_critical. It should work as this bug is not known to occur with versions of ksh compiled with Clang, which always generates instructions for changing the lock.

McDutchie · 2020-06-16T21:26:42Z

Whether it crashes or not, if the compiler leaves out increments that it should be compiling then you're obviously not going to get correct functioning. So I don't really see it as relevant whether the segfault is reproducible or not; it is sufficient to prove that the generated assembly code is wrong.

I didn't trust this back in e3d7bf1 (which disabled it for interactive shells) and I trust it less now. In af6a32d/6b380572, this was also disabled for virtual subshells as it caused program flow corruption there. Now, on macOS 10.14.6, a crash occurs when repeatedly running a command with this optimisation: $ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' 0 1 2 3 4 5 6 7 Illegal instruction Oddly enough it seems that I can only reproduce this crash on macOS -- not on Linux, OpenBSD, or Solaris. It could be a macOS bug, particularly given the odd message in the stack trace below. I've had enough, though. Out it comes. Things now work fine, the reproducer is fixed on macOS, and it didn't optimise much anyway. The double-fork issue discussed in e3d7bf1 remains. ________ For future reference, here's an lldb debugger session with a stack trace. It crashes on calling calloc() (via sh_calloc(), via sh_newof()) in jobsave_create(). This is not an invalid pointer problem as we're allocating new memory, so it does look like an OS bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting. $ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' (lldb) target create "arch/darwin.i386-64/bin/ksh" Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64). (lldb) settings set -- target.run-args "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done" (lldb) run error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'. (lldb) process launch Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64) 0 1 2 3 4 5 6 7 8 9 Process 35038 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 libsystem_platform.dylib`_os_unfair_lock_recursive_abort: -> 0x7fff70deb1c2 <+23>: ud2 libsystem_platform.dylib`_os_unfair_lock_unowned_abort: 0x7fff70deb1c4 <+0>: movl %edi, %eax 0x7fff70deb1c6 <+2>: leaq 0x1a8a(%rip), %rcx ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread" 0x7fff70deb1cd <+9>: movq %rcx, 0x361cb16c(%rip) ; gCRAnnotations + 8 Target 0: (ksh) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 frame #1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239 frame #2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188 frame #3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66 frame #4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99 frame #5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30 frame #6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13 frame #7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8 frame #8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9 frame #9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3 frame #10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29 frame #11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12 frame #12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17 frame #13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16 frame #14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4 frame #15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9 frame #16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4 frame #17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4 frame #18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2 frame #19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9 frame #20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1

The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in 59a5672. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 ksh93#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 ksh93#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 ksh93#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 ksh93#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 ksh93#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 ksh93#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 ksh93#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 ksh93#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh93#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh93#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 ksh93#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 ksh93#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh93#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh93#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 ksh93#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 ksh93#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 ksh93#17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) ksh93#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) ksh93#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 7e317c5 that set slp->slptr to null have been improved with the fix in 59a5672.

The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in 59a5672. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 #17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 7e317c5 that set slp->slptr to null have been improved with the fix in 59a5672.

The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in f24040e. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 #1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 #2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 #3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 #4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 #5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 #6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 #7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 #8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 #11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 #12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 #13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 #14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 #15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 #16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 #17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) #18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) #19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 69d37d5 that set slp->slptr to null have been improved with the fix in f24040e.

The isaname, isaletter, isadigit, isexp and ismeta macros don't check if c is a negative value before accessing sh_lexstates. This can result in ASan crashing because of a buffer overflow in quoting2.sh when running in a multibyte locale: test quoting2(C.UTF-8) begins at 2022-09-23+14:03:12 ================================================================= ==262224==ERROR: AddressSanitizer: global-buffer-overflow on address 0x557b201a451f at pc 0x557b1fe5e6fc bp 0x7fffcf1ac700 sp 0x7fffcf1ac6f8 READ of size 1 at 0x557b201a451f thread T0 #0 0x557b1fe5e6fb in sh_fmtq /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:341:5 ksh93#1 0x557b1fe6098c in sh_fmtqf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/string.c:473:10 ksh93#2 0x557b1ff08dc0 in extend /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:998:14 ksh93#3 0x557b2008a56c in sfvprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfvprintf.c:531:8 ksh93#4 0x557b2005b7f7 in sfprintf /home/johno/GitRepos/KornShell/ksh/src/lib/libast/sfio/sfprintf.c:31:7 ksh93#5 0x557b1ff04272 in b_print /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:343:4 ksh93#6 0x557b1ff04ebf in b_printf /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/print.c:148:9 ksh93#7 0x557b1fe8d9a7 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1261:21 ksh93#8 0x557b1fe7a7cf in sh_subshell /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/subshell.c:652:4 ksh93#9 0x557b1fdedc0d in comsubst /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:2207:9 ksh93#10 0x557b1fdefc79 in varsub /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:1181:3 ksh93#11 0x557b1fde3bef in copyto /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:620:21 ksh93#12 0x557b1fde0b07 in sh_mactrim /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/macro.c:169:2 ksh93#13 0x557b1fe05ab6 in nv_setlist /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/name.c:280:9 ksh93#14 0x557b1fe8a7e8 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1051:7 ksh93#15 0x557b1fe95b85 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1940:5 ksh93#16 0x557b1fe99ea6 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:2271:10 ksh93#17 0x557b1fd23b04 in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604:4 ksh93#18 0x557b1fd1fe10 in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369:2 ksh93#19 0x557b1fd1d585 in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41:9 ksh93#20 0x7f55d5b5028f (/usr/lib/libc.so.6+0x2328f) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82) ksh93#21 0x7f55d5b50349 in __libc_start_main (/usr/lib/libc.so.6+0x23349) (BuildId: 26c81e7e05ebaf40bac3523b7d76be0cd71fad82) ksh93#22 0x557b1fc158d4 in _start /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115 src/cmd/ksh93/include/lexstates.h: - Check if c is negative before accessing sh_lexstates. Backported from ksh2020: att@a7013320. I'll note that later in ksh2020 these macros became functions: att@adc589de. I didn't backport that commit because it requires the C99 bool type to avoid compiler warnings.

JohnoKing force-pushed the fix-sigchld-crash branch from 5e84a05 to fb8cbf6 Compare June 16, 2020 15:18

McDutchie merged commit c258a04 into ksh93:master Jun 16, 2020

JohnoKing deleted the fix-sigchld-crash branch June 18, 2020 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a portable fix for SIGCHLD crashes #18

Implement a portable fix for SIGCHLD crashes #18

JohnoKing commented Jun 16, 2020 •

edited

Loading

McDutchie commented Jun 16, 2020

aweeraman commented Jun 16, 2020

JohnoKing commented Jun 16, 2020

McDutchie commented Jun 16, 2020

Implement a portable fix for SIGCHLD crashes #18

Implement a portable fix for SIGCHLD crashes #18

Conversation

JohnoKing commented Jun 16, 2020 • edited Loading

McDutchie commented Jun 16, 2020

aweeraman commented Jun 16, 2020

JohnoKing commented Jun 16, 2020

McDutchie commented Jun 16, 2020

JohnoKing commented Jun 16, 2020 •

edited

Loading