Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault in lj_vm_growstack_f #25

Closed
tokers opened this issue Aug 9, 2018 · 4 comments
Closed

segmentation fault in lj_vm_growstack_f #25

tokers opened this issue Aug 9, 2018 · 4 comments

Comments

@tokers
Copy link

tokers commented Aug 9, 2018

Hello!

We occurred a segmentation fault in LuaJIT. The backtrace is:

(gdb) bt
#0  0x00007f0c1a6aade2 in lj_vm_growstack_f () from /usr/local/marco/luajit/lib/libluajit-5.1.so.2
#1  0x0000000000550c13 in ngx_http_lua_run_thread (L=0x41a00378, r=0x1767f80, ctx=0x13793f0, nrets=0)
    at /disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_util.c:1013
#2  0x000000000057825c in ngx_http_lua_ssl_cert_by_chunk (L=0x41a00378, r=0x1767f80)
    at /disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_ssl_certby.c:527
#3  0x0000000000577457 in ngx_http_lua_ssl_cert_handler_file (r=0x1767f80, lscf=0x12c2138, L=0x41a00378)
    at /disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_ssl_certby.c:57
#4  0x0000000000577c2c in ngx_http_lua_ssl_cert_handler (ssl_conn=0x1765790, data=0x0)
    at /disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_ssl_certby.c:315
#5  0x00007f0c1a1e544a in tls_post_process_client_hello (s=0x1765790, wst=WORK_MORE_B) at ssl/statem/statem_srvr.c:2179
#6  0x00007f0c1a1e2d2f in ossl_statem_server_post_process_message (s=0x1765790, wst=WORK_MORE_A) at ssl/statem/statem_srvr.c:1148
#7  0x00007f0c1a1cfe52 in read_state_machine (s=0x1765790) at ssl/statem/statem.c:660
#8  0x00007f0c1a1cf7a9 in state_machine (s=0x1765790, server=1) at ssl/statem/statem.c:428
#9  0x00007f0c1a1cf33b in ossl_statem_accept (s=0x1765790) at ssl/statem/statem.c:251
#10 0x00007f0c1a1b642d in ssl_do_handshake_intern (vargs=0x12c6730) at ssl/ssl_lib.c:3467
#11 0x00007f0c19d0770f in async_start_func () at crypto/async/async.c:154
#12 0x00007f0c199108f0 in __malloc_info (fp=0x7ffea8f29be0, options=<optimized out>) at malloc.c:5196
#13 0x0000000001f6b870 in ?? ()
#14 0x0000000000000000 in ?? ()
    0x7f0c1a6aadb3 <lj_vm_growstack_f>      lea    -0x8(%rdx,%rax,8),%eax                                                                                                           │
   │0x7f0c1a6aadb7 <lj_vm_growstack_f+4>    movzbl -0x3d(%rbx),%ecx                                                                                                                 │
   │0x7f0c1a6aadbb <lj_vm_growstack_f+8>    add    $0x4,%ebx                                                                                                                        │
   │0x7f0c1a6aadbe <lj_vm_growstack_f+11>   mov    %edx,0x10(%rbp)                                                                                                                  │
   │0x7f0c1a6aadc1 <lj_vm_growstack_f+14>   mov    %eax,0x18(%rbp)                                                                                                                  │
   │0x7f0c1a6aadc4 <lj_vm_growstack_f+17>   mov    %ebx,0x1c(%rsp)                                                                                                                  │
   │0x7f0c1a6aadc8 <lj_vm_growstack_f+21>   mov    %ecx,%esi                                                                                                                        │
   │0x7f0c1a6aadca <lj_vm_growstack_f+23>   mov    %ebp,%edi                                                                                                                        │
   │0x7f0c1a6aadcc <lj_vm_growstack_f+25>   callq  0x7f0c1a6b4470 <lj_state_growstack>                                                                                              │
   │0x7f0c1a6aadd1 <lj_vm_growstack_f+30>   mov    0x10(%rbp),%edx                                                                                                                  │
   │0x7f0c1a6aadd4 <lj_vm_growstack_f+33>   mov    0x18(%rbp),%eax                                                                                                                  │
   │0x7f0c1a6aadd7 <lj_vm_growstack_f+36>   mov    -0x8(%rdx),%ebp                                                                                                                  │
   │0x7f0c1a6aadda <lj_vm_growstack_f+39>   sub    %edx,%eax                                                                                                                        │
   │0x7f0c1a6aaddc <lj_vm_growstack_f+41>   shr    $0x3,%eax                                                                                                                        │
   │0x7f0c1a6aaddf <lj_vm_growstack_f+44>   add    $0x1,%eax                                                                                                                        │
  >| 0x7f0c1a6aade2 <lj_vm_growstack_f+47>   mov    0x10(%rbp),%ebx                                                                                                                 
   │0x7f0c1a6aade5 <lj_vm_growstack_f+50>   mov    (%rbx),%ecx                                                                                                                      │
   │0x7f0c1a6aade7 <lj_vm_growstack_f+52>   movzbl %cl,%ebp                                                                                                                         │
   │0x7f0c1a6aadea <lj_vm_growstack_f+55>   movzbl %ch,%ecx                                                                                                                         │
   │0x7f0c1a6aaded <lj_vm_growstack_f+58>   add    $0x4,%ebx                                                                                                                        │
   │0x7f0c1a6aadf0 <lj_vm_growstack_f+61>   jmpq   *(%r14,%rbp,8)

It seems that data inside %rbp was corrupted?

(gdb) p/x $rbp
$1 = 0x7009a593
(gdb) x 0x7009a593
0x7009a593:     Cannot access memory at address 0x7009a593

(gdb) info thread
  Id   Target Id         Frame
* 1    LWP 2883818       0x00007f0c1a6aade2 in lj_vm_growstack_f () from /usr/local/marco/luajit/lib/libluajit-5.1.so.2

We are using the asynchronous OpenSSL mode (with the dasync engine), it uses it's own co-routines. I don't know whether this can influence LuaJIT.

The segmentation fault disappeared after disabling the SSL_MODE_ASYNC. In addition, the frequency of this exception will reduce if disables JIT.

Our LuaJIT version is https://github.com/openresty/luajit2/releases/tag/v2.1-20171103 .
The Linux Kernel version is 4.9.0.

I also opened an issue in here: openssl/openssl#6864 .

Is there any idea for the fixup or work-around? Thanks!

@agentzh
Copy link
Member

agentzh commented Aug 9, 2018

@tokers LuaJIT is not multi-thread safe in general. I know very little about your setup, maybe it's using multiple OS threads. Not sure. Your backtrace looks like a stack corruption or heap corruptions. Such things are not usually debuggable without a reproducible environment.

@tokers
Copy link
Author

tokers commented Aug 10, 2018

@agentzh

There is only one OS thread in our service.

We used the Mozilla rr recorded the crash environment, but only get a few clues.

   │0x7f0c1a6aadbe <lj_vm_growstack_f+11>   mov    %edx,0x10(%rbp)                                                                                                                  │
   │0x7f0c1a6aadc1 <lj_vm_growstack_f+14>   mov    %eax,0x18(%rbp)                                                                                                                  │
   │0x7f0c1a6aadc4 <lj_vm_growstack_f+17>   mov    %ebx,0x1c(%rsp)                                                                                                                  │
   │0x7f0c1a6aadc8 <lj_vm_growstack_f+21>   mov    %ecx,%esi                                                                                                                        │
   │0x7f0c1a6aadca <lj_vm_growstack_f+23>   mov    %ebp,%edi                                                                                                                        │
   │0x7f0c1a6aadcc <lj_vm_growstack_f+25>   callq  0x7f0c1a6b4470 <lj_state_growstack>                                                                                              │
   │0x7f0c1a6aadd1 <lj_vm_growstack_f+30>   mov    0x10(%rbp),%edx                                                                                                                  │
   │0x7f0c1a6aadd4 <lj_vm_growstack_f+33>   mov    0x18(%rbp),%eax                                                                                                                  │
   │0x7f0c1a6aadd7 <lj_vm_growstack_f+36>   mov    -0x8(%rdx),%ebp                                                                                                                  │
   │0x7f0c1a6aadda <lj_vm_growstack_f+39>   sub    %edx,%eax                                                                                                                        │
   │0x7f0c1a6aaddc <lj_vm_growstack_f+41>   shr    $0x3,%eax                                                                                                                        │
   │0x7f0c1a6aaddf <lj_vm_growstack_f+44>   add    $0x1,%eax                                                                                                                        │
  >| 0x7f0c1a6aade2 <lj_vm_growstack_f+47>   mov    0x10(%rbp),%ebx                       

The %edx was stored in %rbp + 0x10 before calling lj_state_growstack. After lj_state_growstack was called, address %rbp + 0x10 isn't accessable. We set a watch point at this address:

(rr) p/x $rbp + 0x10
$19 = 0x411d2730
(rr) watch *(int *) 0x411d2730
Hardware watchpoint 8: *(int *) 0x411d2730
(rr) continue
Continuing.

Hardware watchpoint 8: *(int *) 0x411d2730

Old value = 1092445696
New value = 1081358768
resizestack (L=0x411d2720, n=192) at lj_state.c:76

image

image

(rr) p L
$70 = (lua_State *) 0x411d2720
(rr) p &L->base
$71 = (TValue **) 0x411d2730
(rr) p delta
$72 = -11086928
(rr) p st
$82 = (TValue *) 0x4073fbd8
(rr) p oldst
$83 = (TValue *) 0x411d2828
(rr) p/x L->stack
$84 = {ptr32 = 0x4073fbd8}
(rr) p oldsize
$85 = 98
(rr) p realsize
$86 = 198

BTW, here is the registers information when 0x411d2730 was changed:

rax            0x0      0
rbx            0xc6     198
rcx            0xffffffffff56d3b0       -11086928
rdx            0x400cc3b8       1074578360
rsi            0x0      0
rdi            0x411d2818       1092429848
rbp            0xc6     0xc6
rsp            0x1eee210        0x1eee210
r8             0x4073fbd8       1081342936
r9             0x4      4
r10            0x4073fbd8       1081342936
r11            0x8      8
r12            0x411d2720       1092429600
r13            0x411d2828       1092429864
r14            0xc0     192
r15            0x40655c28       1080384552
rip            0x7f67d9374061   0x7f67d9374061 <resizestack+145>
eflags         0x203    [ CF IF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fs_base        0x7f67da076740   0x7f67da076740
gs_base        0x0      0x0

@tokers
Copy link
Author

tokers commented Aug 10, 2018

@agentzh

By the way, someone replied this problem in the luajit maillist: https://www.freelists.org/post/luajit/segmentation-fault-in-lj-vm-growstack-f,1.

@tokers
Copy link
Author

tokers commented Aug 15, 2018

@agentzh
Seems that the problem was fixed after we added some tricks. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants