server threads die with assertion error #6222

mxk1235 · 2013-09-13T05:04:39Z

using node 0.10.18 on Ubuntu 12.04 LTS.

../deps/uv/src/unix/stream.c:494: uv__server_io: Assertion `events == 1' failed

happens several times a second, effectively killing the threads. I found an old thread that is closed now because of no repro.

thanks in advance.

mxk1235 · 2013-09-13T05:24:05Z

to give more context, this is very closely related to this discussion joyent/libuv#826

we are using express 3.1.0 and cluster, which forks upto 7-10 servers that listen on the same port.

i would love to get a fix, but the discussion also mentions a workaround with a 5ms delay. any idea if that would work?

bnoordhuis · 2013-09-13T12:38:31Z

Can you try this?

$ strace -o trace.log -fe \!mmap,munmap,mprotect,brk,gettimeofday,futex node script.js

Please post the trace.log here.

mxk1235 · 2013-09-13T18:25:16Z

fwiw, i used this script http://sprunge.us/FPjU with trace. the output is quite large with our code because the socket close event does not occur for ~60 minutes.

https://s3.amazonaws.com/pierrerize-public/trace.log

bnoordhuis · 2013-09-19T11:21:19Z

Thanks. Can you post the output of uname -a on the affected system? From reading the trace, it looks like the process that aborts:

receives a file descriptor (socket handle) from another process
adds it to the epoll set with mode=EPOLLIN
then calls epoll_wait() which returns mode=EPOLLOUT

It seems the file descriptor is supposed to be a listen socket but isn't. A test case would help because I can't exclude the possibility yet that there is a bug in your code somewhere.

mxk1235 · 2013-09-20T17:43:38Z

mike@mike-test-1:~$ uname -a
Linux mike-test-1 3.2.0-51-virtual #77-Ubuntu SMP Wed Jul 24 20:38:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

bnoordhuis · 2013-09-20T22:03:29Z

Thanks. I received a couple of other reports and I'm reasonably sure by now that it's caused by stale events for a file descriptor that's been replaced by a file descriptor from another process. No ETA for a fix yet.

mxk1235 · 2013-09-20T22:19:07Z

thanks for looking at it. from what i understand, it looks connected to the way cluster listens to sockets. does it make sense to have a delay when restarting the thread as a workaround? or perhaps there is an alternative to cluster that we can use.

bnoordhuis · 2013-09-20T22:22:24Z

If by 'thread' you mean 'worker process', then there's no real way to work around it currently (apart from not using the cluster module, of course.)

mxk1235 · 2013-09-20T22:24:16Z

yes. thank you!

mxk1235 · 2013-09-30T23:12:05Z

hi, saw that this is added to 0.10 milestone. any other ETA info?

bnoordhuis · 2013-10-01T06:53:57Z

Not really, I'm afraid. I'm on leave currently, I only work on node sporadically. Unless someone else picks this up (unlikely), it'll have to wait until I get back and have time to look into it.

laino · 2013-10-11T20:50:31Z

I'm running into this too. For some weird reason it only happens on one of two exactly the same way configured servers (even after re-installing node and npm) and I can't get it to work on the first one.

Is there any easy (or dirty) fix for this issue that doesn't involve not using the cluster module?
It's quite quite annoying as node crashes immediately after starting for me.

Thanks.

anacronw · 2013-10-19T00:18:13Z

I am also running into this issue. Just adding to the list if it helps :)

general stack is: cluster + express + redis

uname -a output (we are using CentOS):
2.6.32-358.11.1.el6.x86_64 #1 SMP Wed Jun 12 03:34:52 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

The symptom is 100% cpu utilization on master. Running strace shows a bunch of file descriptor errors.

In my case, nothing really crashes and the application runs OK actually. Its just noticeable when using top.

Thanks for confirming the issue @bnoordhuis, I can turn off clustering in my case.

laino · 2013-10-19T08:31:12Z

For me it was fixed by listening to the ports immediately instead of waiting for a redis connection etc. Quite ugly as the server can't really process requests the moment he comes online now.

vpArth · 2013-10-22T08:51:07Z

This issue repeated for me when nodemon restarts application on file changes
application is simple cluster + restify workers

indutny · 2013-10-25T22:58:50Z

Should be fixed by joyent/libuv#971

mxk1235 · 2013-10-25T23:31:59Z

awesome! will try.

vpArth · 2013-10-28T03:59:38Z

Good
By a point, this can be fixed by new worker forking on exit event.
Implemented as restart option in the node-cluster-app

trevnorris · 2013-10-28T23:32:37Z

@mxk1235 can you confirm that the issue has been fixed?

indutny · 2013-11-12T11:48:44Z

Fixed in joyent/libuv@bbccafb and joyent/libuv@f50ccd5. Should get into the next v0.10 and v0.11 release.

mmastrac · 2013-11-18T20:14:57Z

I'm still seeing this with 0.10.22 with our setup. What we're seeing is that when we run grunt as a task in Gradle, node dies with an assertion error when trying to launch a process in Karma:

grunt: ../deps/uv/src/unix/stream.c:494: uv__server_io: Assertion `events == 1' failed.

Oddly enough, it only happens if grunt runs a shell command before Karma attempts to run the browser, and only if grunt is run under Gradle.

I've run strace over it -- it looks like epoll_ctl starts throwing these soon after the shell execute finishes (search for rm -Rf):

29687 <... epoll_ctl resumed> ) = -1 EBADF (Bad file descriptor)

https://www.dropbox.com/s/geiat4cwb91sqt7/trace.log.gz

When run outside of Gradle (where it works), you can see the epoll running after the rm -Rf is successful:

https://www.dropbox.com/s/jq7osljb8x5j0kd/trace2.log.gz

mmastrac · 2013-11-18T21:08:56Z

This may be related to #6271, as the strace log was showing the same EPOLL_CTL_DEL behaviour.

indutny · 2013-11-20T22:21:48Z

Reopening, will look into it soon.

marckysharky · 2013-11-21T11:31:26Z

Also have this issue when running Grunt via a Jenkins Job.

I have uploaded a truncated trace, via the following strace:

strace -o trace.log -fe \!mmap,munmap,mprotect,brk,gettimeofday,futex grunt

https://s3.amazonaws.com/grunt.errors/trace.log

System:

uname -a
Linux {truncated} 3.2.0-52-virtual #78-Ubuntu SMP Fri Jul 26 16:45:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

node --version
v0.10.22

npm --version
1.3.14

Node is via the chris-lea ppa (http://ppa.launchpad.net/chris-lea/node.js/ubuntu/pool/main/n/nodejs/)

darthtrevino · 2014-03-07T22:04:29Z

I've been seeing this issue in the last couple of weeks when using our Jenkins instance on Cloudbees:

uname -a
Linux localhost 3.7.10-101.fc17.x86_64 #1 SMP Wed Feb 27 19:14:22 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

dmitry-vsl · 2014-03-11T19:32:00Z

I also have this issue when running grunt-contrib-connect from jenkins.

apaleslimghost · 2014-03-19T15:25:43Z

Thanks @mmastrac, been trying to diagnose this for a day or two, removing a shell command fixes it.

crashbell · 2014-06-12T06:03:04Z

I got the same issue with Node v0.10.20

uname -a
Linux Witurn 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

node --version
v0.10.20

npm --version
1.3.10

Should I upgrade Node version?

indutny · 2014-06-12T14:45:13Z

Yes, I would suggest to update node first.

tsukasagenesis · 2014-06-28T16:07:49Z

Same problem here:
Linux 73118 3.13.0-30-generic #54-Ubuntu SMP Mon Jun 9 22:45:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
node v0.10.29 from ppa.launchpad.net_chris-lea_node.js_ubuntu_dists_trusty_main_binary-amd64_Package

seem reuse the bad file_descriptor, see strace :

read(8, "\1\0\0\0\0\0\0\0", 1024)       = 8
epoll_wait(5, {{EPOLLIN, {u32=25, u64=25}}}, 1024, 88) = 1
read(25, "", 65536)                     = 0
futex(0x7fbe8c0008c8, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 25, {EPOLLRDNORM|EPOLLWRNORM|EPOLLHUP|0x1b3d820, {u32=0, u64=122681170965037056}}) = 0
close(25)                               = 0
futex(0xe13d84, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xe13d80, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0xe13d40, FUTEX_WAKE_PRIVATE, 1)  = 1
write(1, "fd: 24\n", 7)                 = 7
write(1, "11byebye\n", 9)               = 9
epoll_wait(5, {{EPOLLIN, {u32=11, u64=11}}, {EPOLLIN, {u32=8, u64=8}}}, 1024, 76) = 2
accept4(11, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 24
accept4(11, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(8, "\1\0\0\0\0\0\0\0", 1024)       = 8
write(1, "11close2\n", 9)               = 9
futex(0xe13d84, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xe13d80, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0xe13d40, FUTEX_WAKE_PRIVATE, 1)  = 1
epoll_ctl(5, EPOLL_CTL_ADD, 24, {EPOLLIN, {u32=24, u64=24}}) = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(18944, 18944, SIGABRT)           = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=18944, si_uid=0} ---
+++ killed by SIGABRT (core dumped) +++

ljwagerfield · 2015-04-03T12:49:15Z

I also receive the following error when running grunt connect on Ubuntu 14.04 (x64) with node v0.10.25, but only when running from crontab. The same user is invoking the process in both crontab (which fails) and the manual execution (which succeeds).

grunt: ../deps/uv/src/unix/stream.c:494: uv__server_io: Assertionevents == 1' failed.`

chicagobuss · 2015-04-29T15:39:35Z

For people having problems with running these commands from jenkins or cron, try adding the --norc flag to your bash script - it worked for us and we were seeing the exact same behavior (worked when running manually, failing on cron and jenkins).

jasnell · 2015-06-03T16:48:12Z

@indutny ... was there a resolution on this one?

indutny · 2015-06-03T17:06:48Z

I suppose it is resolved now, didn't see any report except this.

indutny · 2015-06-03T17:06:56Z

But no formal resolution either.

jasnell · 2015-06-03T17:08:14Z

Ok, closing then. Can reopen if new information is received.

bostrom · 2015-09-29T14:51:33Z

Any news on this one?

I'm getting this error when jenkins executes grunt karma on a Ubuntu slave. Adding #!/bin/bash --norc or just bash --norc to the beginning of my "Execute shell" build step doesn't work.

I've tried searching the net for answers, but I got nothing.

Slave:

uname -a
Linux jenkins-slave 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

node --version
v0.10.25

npm --version
1.3.10

tsukasagenesis · 2015-10-16T11:46:48Z

just never use FD directly, it's not correctly supported and if you close one he can still try to use it and crash. that all for me.

ghost assigned bnoordhuis Sep 20, 2013

indutny closed this as completed Nov 12, 2013

indutny reopened this Nov 20, 2013

mxk1235 unassigned bnoordhuis Mar 7, 2014

jasnell closed this as completed Jun 3, 2015

server threads die with assertion error #6222

server threads die with assertion error #6222

Comments

mxk1235 commented Sep 13, 2013

mxk1235 commented Sep 13, 2013

bnoordhuis commented Sep 13, 2013

mxk1235 commented Sep 13, 2013

bnoordhuis commented Sep 19, 2013

mxk1235 commented Sep 20, 2013

bnoordhuis commented Sep 20, 2013

mxk1235 commented Sep 20, 2013

bnoordhuis commented Sep 20, 2013

mxk1235 commented Sep 20, 2013

mxk1235 commented Sep 30, 2013

bnoordhuis commented Oct 1, 2013

laino commented Oct 11, 2013

anacronw commented Oct 19, 2013

laino commented Oct 19, 2013

vpArth commented Oct 22, 2013

indutny commented Oct 25, 2013

mxk1235 commented Oct 25, 2013

vpArth commented Oct 28, 2013

trevnorris commented Oct 28, 2013

indutny commented Nov 12, 2013

mmastrac commented Nov 18, 2013

mmastrac commented Nov 18, 2013

indutny commented Nov 20, 2013

marckysharky commented Nov 21, 2013

darthtrevino commented Mar 7, 2014

dmitry-vsl commented Mar 11, 2014

apaleslimghost commented Mar 19, 2014

crashbell commented Jun 12, 2014

indutny commented Jun 12, 2014

tsukasagenesis commented Jun 28, 2014

ljwagerfield commented Apr 3, 2015

chicagobuss commented Apr 29, 2015

jasnell commented Jun 3, 2015

indutny commented Jun 3, 2015

indutny commented Jun 3, 2015

jasnell commented Jun 3, 2015

bostrom commented Sep 29, 2015

tsukasagenesis commented Oct 16, 2015