Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults on Ubuntu 14.04 #1380

Closed
max-degterev opened this issue Jun 22, 2015 · 44 comments
Closed

Segfaults on Ubuntu 14.04 #1380

max-degterev opened this issue Jun 22, 2015 · 44 comments
Labels

Comments

@max-degterev
Copy link

Hi!

We used to use upstart to run our Node apps and decided to try to move to pm2 instead. Unfortunately on our production machine it crashes at least once a day.

kern.log:
Jun 22 15:06:02 Ubuntu-1404-trusty-64-minimal kernel: [569531.959543] PM2 v0.14.1: Go[19539]: segfault at 3fffffffa0 ip 0000000000902e38 sp 00007fff99f29290 error 4 in nodejs[400000+b60000]

We tried to add upstart to call pm2 resurrect in case pm2 dies, but apparently that doesn't work too. Couldn't really check that one though.

We have node v0.12.4.

Is anyone else experiencing the same issue? It's really a shame, I loved keymetrics interface, but I can't use it without pm2 on the server.

@jshkurti
Copy link
Contributor

The only thing I found : nodejs/node-v0.x-archive#2376

@jshkurti
Copy link
Contributor

This may happen while using PM2 but I honestly think it has nothing to do with PM2 :)

@jshkurti
Copy link
Contributor

Give it a try with npm i -g Unitech/PM2#development though.

@max-degterev
Copy link
Author

Different symptoms, currently it just crashes all node apps, and it does say PM2 v0.14.1

@max-degterev
Copy link
Author

Unitech/PM2#development seems to be stable. Any chance to get it published to the NPM soon?

@jshkurti
Copy link
Contributor

Published @0.14.2 ;)

@NikitaKrasavtsev
Copy link

Hello.
We are experiencing similar issue on Ubuntu v14.04.2, pm2 v0.14.3 & node v0.12.3
Jul 6 12:51:41 kernel: [8099284.062085] traps: PM2 v0.14.3: Go[3068] general protection ip:902e38 sp:7fff13dab650 error:0 in nodejs[400000+b60000]

@jhansen-tt
Copy link

I am also seeing this issue on Ubuntu 14.04 w/nodejs 0.12.6 and PM2 0.14.3, and it is not due to the hung-task kernel issue that has been referenced previously. This is the only thing that shows up in our kernel log:

[ 8023.396041] PM2 v0.14.3: Go[29558]: segfault at 8090000404c ip 00000000009037f8 sp 00007fff7e3d9a80 error 4 in nodejs[400000+b61000]

I have to do '/etc/init.d/pm2-init.sh start' to get things running again.

If this is a nodejs bug, I am happy to file a bug with them, I just want to help fix the issue however I can. If PM2 doesn't have any native code then it is definitely a nodejs bug.

@max-degterev
Copy link
Author

Jul  8 12:35:55 serenity kernel: [1473951.960661] PM2 v0.14.2: Go[22640]: segfault at ffffffffffffffa0 ip 0000000000902e38 sp 00007fff76291c20 error 5 in nodejs[400000+b60000]

We'll have to move back to upstart. Pity.

Good luck.

@jshkurti
Copy link
Contributor

jshkurti commented Jul 8, 2015

Is there any error in ~/.pm2/pm2.log so we could at least know when that happens in the JS code ?

@membrive
Copy link

membrive commented Jul 8, 2015

I don't see any errors on ~/.pm2/pm2.log, it seems normal.

Yesterday I reinstalled node.js from the source code (./configure, make and make install), and pm2 so far hasn't been crashed! I'm waiting...

Did anyone with this problem try with node.js installed from the source code?

@jhansen-tt
Copy link

There is nothing in my pm2.log at the time of the segfault. Anything else I can look at? What is the best way to start pm2 with gdb?

@max-degterev
Copy link
Author

pm2.log:

2015-07-08 00:30:09: [PM2][WORKER] Process 6 restarted because it exceeds --max-memory-restart value
2015-07-08 00:30:09: Starting execution sequence in -cluster mode- for app name:ghost id:6
2015-07-08 00:30:09: App name:ghost id:6 online
2015-07-08 00:30:10: -softReload- New worker listening
2015-07-08 00:30:18: Stopping app:ghost id:_old_6

(null):0
(null)

RangeError: Maximum call stack size exceeded
2015-07-08 00:30:18: Process with pid 18054 killed
2015-07-08 11:58:17: Starting execution sequence in -cluster mode- for app name:loki id:7
2015-07-08 11:58:17: App name:loki id:7 online
2015-07-08 11:58:18: -reload- New worker listening
2015-07-08 11:58:18: Stopping app:loki id:_old_7
2015-07-08 11:58:18: Process with pid 19207 killed
2015-07-08 12:52:00: [PM2][WORKER] Started with refreshing interval: 30000
2015-07-08 12:52:00: [[[[ PM2/God daemon launched ]]]]
2015-07-08 12:52:00: BUS system [READY] on port /root/.pm2/pub.sock
2015-07-08 12:52:00: RPC interface [READY] on port /root/.pm2/rpc.sock
2015-07-08 12:52:12: Starting execution sequence in -cluster mode- for app name:groot id:0
2015-07-08 12:52:12: App name:groot id:0 online
2015-07-08 12:52:12: Starting execution sequence in -cluster mode- for app name:groot id:1
2015-07-08 12:52:12: App name:groot id:1 online
2015-07-08 12:52:12: Starting execution sequence in -fork mode- for app name:groot-queue id:2
2015-07-08 12:52:12: App name:groot-queue id:2 online
2015-07-08 12:52:12: Starting execution sequence in -cluster mode- for app name:jarvis id:3
2015-07-08 12:52:12: App name:jarvis id:3 online
2015-07-08 12:52:12: Starting execution sequence in -cluster mode- for app name:jarvis id:4
2015-07-08 12:52:12: App name:jarvis id:4 online
2015-07-08 12:52:12: Starting execution sequence in -cluster mode- for app name:beast id:5
2015-07-08 12:52:13: App name:beast id:5 online
2015-07-08 12:52:13: Starting execution sequence in -cluster mode- for app name:ghost id:6
2015-07-08 12:52:13: App name:ghost id:6 online
2015-07-08 12:52:13: Starting execution sequence in -cluster mode- for app name:loki id:7
2015-07-08 12:52:13: App name:loki id:7 online
2015-07-08 12:52:13: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:13: App name:mongul id:8 online
2015-07-08 12:52:14: App name:mongul id:8 exited with code 1
2015-07-08 12:52:14: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:14: App name:mongul id:8 online
2015-07-08 12:52:14: App name:mongul id:8 exited with code 1
2015-07-08 12:52:14: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:14: App name:mongul id:8 online
2015-07-08 12:52:15: App name:mongul id:8 exited with code 1
2015-07-08 12:52:15: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:15: App name:mongul id:8 online
2015-07-08 12:52:15: App name:mongul id:8 exited with code 1
2015-07-08 12:52:15: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:15: App name:mongul id:8 online
2015-07-08 12:52:16: App name:mongul id:8 exited with code 1
2015-07-08 12:52:16: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:16: App name:mongul id:8 online
2015-07-08 12:52:16: App name:mongul id:8 exited with code 1
2015-07-08 12:52:16: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:16: App name:mongul id:8 online
2015-07-08 12:52:16: App name:mongul id:8 exited with code 1
2015-07-08 12:52:16: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:16: App name:mongul id:8 online
2015-07-08 12:52:17: App name:mongul id:8 exited with code 1
2015-07-08 12:52:17: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:17: App name:mongul id:8 online
2015-07-08 12:52:17: App name:mongul id:8 exited with code 1
2015-07-08 12:52:17: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:17: App name:mongul id:8 online
2015-07-08 12:52:17: App name:mongul id:8 exited with code 1
2015-07-08 12:52:17: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:17: App name:mongul id:8 online
2015-07-08 12:52:17: App name:mongul id:8 exited with code 1
2015-07-08 12:52:17: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:17: App name:mongul id:8 online
2015-07-08 12:52:18: App name:mongul id:8 exited with code 1
2015-07-08 12:52:18: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:18: App name:mongul id:8 online
2015-07-08 12:52:18: App name:mongul id:8 exited with code 1
2015-07-08 12:52:18: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:18: App name:mongul id:8 online
2015-07-08 12:52:18: App name:mongul id:8 exited with code 1
2015-07-08 12:52:18: Starting execution sequence in -fork mode- for app name:mongul id:8
2015-07-08 12:52:18: App name:mongul id:8 online
2015-07-08 12:52:19: App name:mongul id:8 exited with code 1
2015-07-08 12:52:19: Script /var/www/mongul/app.js had too many unstable restarts (15). Stopped. "errored"
2015-07-08 13:20:55: Stopping app:groot-queue id:2
2015-07-08 13:20:55: App name:groot-queue id:2 exited with code SIGINT
2015-07-08 13:20:55: Process with pid 15624 killed
2015-07-08 13:20:55: Starting execution sequence in -fork mode- for app name:groot-queue id:2
2015-07-08 13:20:55: App name:groot-queue id:2 online
2015-07-08 13:20:59: Starting execution sequence in -cluster mode- for app name:groot id:0
2015-07-08 13:20:59: App name:groot id:0 online
2015-07-08 13:21:01: -reload- New worker listening
2015-07-08 13:21:01: Stopping app:groot id:_old_0
2015-07-08 13:21:01: Process with pid 15586 killed
2015-07-08 13:21:01: Starting execution sequence in -cluster mode- for app name:groot id:1
2015-07-08 13:21:02: App name:groot id:1 online
2015-07-08 13:21:04: -reload- New worker listening
2015-07-08 13:21:04: Stopping app:groot id:_old_1
2015-07-08 13:21:04: Process with pid 15597 killed
2015-07-08 13:35:31: Stopping app:groot-queue id:2
2015-07-08 13:35:31: App name:groot-queue id:2 exited with code SIGINT
2015-07-08 13:35:31: Process with pid 25163 killed
2015-07-08 13:35:31: Starting execution sequence in -fork mode- for app name:groot-queue id:2
2015-07-08 13:35:31: App name:groot-queue id:2 online
2015-07-08 13:35:36: Starting execution sequence in -cluster mode- for app name:groot id:0
2015-07-08 13:35:37: App name:groot id:0 online
2015-07-08 13:35:39: -reload- New worker listening
2015-07-08 13:35:39: Stopping app:groot id:_old_0
2015-07-08 13:35:39: Process with pid 25238 killed
2015-07-08 13:35:39: Starting execution sequence in -cluster mode- for app name:groot id:1
2015-07-08 13:35:39: App name:groot id:1 online
2015-07-08 13:35:42: -reload- New worker listening
2015-07-08 13:35:42: Stopping app:groot id:_old_1
2015-07-08 13:35:42: Process with pid 25268 killed
2015-07-08 14:11:17: Stopping app:groot-queue id:2
2015-07-08 14:11:17: App name:groot-queue id:2 exited with code SIGINT
2015-07-08 14:11:17: Process with pid 30006 killed
2015-07-08 14:11:17: Starting execution sequence in -fork mode- for app name:groot-queue id:2
2015-07-08 14:11:17: App name:groot-queue id:2 online
2015-07-08 14:11:22: Starting execution sequence in -cluster mode- for app name:groot id:0
2015-07-08 14:11:22: App name:groot id:0 online
2015-07-08 14:11:25: -reload- New worker listening
2015-07-08 14:11:25: Stopping app:groot id:_old_0
2015-07-08 14:11:25: Process with pid 30227 killed
2015-07-08 14:11:25: Starting execution sequence in -cluster mode- for app name:groot id:1
2015-07-08 14:11:25: App name:groot id:1 online
2015-07-08 14:11:28: -reload- New worker listening
2015-07-08 14:11:28: Stopping app:groot id:_old_1
2015-07-08 14:11:28: Process with pid 30256 killed
2015-07-08 14:12:37: App name:ghost id:6 disconnected
2015-07-08 14:12:37: App name:ghost id:6 exited with code 0
2015-07-08 14:12:37: Starting execution sequence in -cluster mode- for app name:ghost id:6
2015-07-08 14:12:37: App name:ghost id:6 online
2015-07-08 14:23:13: Starting execution sequence in -cluster mode- for app name:jarvis id:3
2015-07-08 14:23:13: App name:jarvis id:3 online
2015-07-08 14:23:13: -reload- New worker listening
2015-07-08 14:23:13: Stopping app:jarvis id:_old_3
2015-07-08 14:23:13: Process with pid 15629 killed
2015-07-08 14:23:13: Starting execution sequence in -cluster mode- for app name:jarvis id:4
2015-07-08 14:23:13: App name:jarvis id:4 online
2015-07-08 14:23:14: -reload- New worker listening
2015-07-08 14:23:14: Stopping app:jarvis id:_old_4
2015-07-08 14:23:14: Process with pid 15666 killed

@soyuka
Copy link
Collaborator

soyuka commented Jul 8, 2015

Lot of restarts there ^^.

This isn't good: RangeError: Maximum call stack size exceeded. Do others have this kind of error too?

@jshkurti
Copy link
Contributor

jshkurti commented Jul 8, 2015

@suprMax According to the first post, the segfault happenned Jun 22 15:06:02. Your pm2.log shows 2015-07-08.

@soyuka
Copy link
Collaborator

soyuka commented Jul 8, 2015

@jhansen-tt Last time I had to debug a segfault I used https://github.com/ddopson/node-segfault-handler. I don't know if it's possible with gdb.

@jshkurti indeed.

@jhansen-tt
Copy link

@soyuka I don't get the RangeError, or anything in my pm2.log when it happens. Just the segfault notification in the kernel log. Right now I have to use crontab to see if the PID in ~/.pm2/pm2.pid is still running, and if not, then I run /etc/init.d/pm2 restart...

@jshkurti
Copy link
Contributor

jshkurti commented Jul 8, 2015

Any chance newer versions of Node could fix this ? (perhaps Node v0.12.6)

@jhansen-tt
Copy link

@jshkurti I am already on 0.12.6 and I see it.

@soyuka
Copy link
Collaborator

soyuka commented Jul 8, 2015

@jhansen-tt would be nice to have a way to reproduce or a better stack trace :/. I'm using node 0.12.3 with pm2 0.14.1 with a 20D+ uptime (debian 7). +1 @jshkurti or try to go back to node 0.12.3 to see if this still happens (just saw that others were using node 0.12.3).

@jhansen-tt
Copy link

@soyuka I am just running an express server and a couple of python scripts with PM2... Nothing fancy.

@soyuka
Copy link
Collaborator

soyuka commented Jul 8, 2015

Same here @jhansen-tt but with fancy things and I've absolutely no issue. On another server I've pm2 0.14.3 with node 0.12.5 1 ghost blog and another express app with no segfault (debian 8).
I'll try to give ubuntu a try tonight and let you know.

@max-degterev
Copy link
Author

@jshkurti I got log only for today cos It happened today too at ~12:35:55. Nothing in the log around that time though.

@wreckah FYI

@parisholley
Copy link

i ran into a segfault today, but I can't get my processes to launch now...

@parisholley
Copy link

running

pm2 startOrGracefulReload ecosystem.json --env production

produced no output in logs and just hangs

when I called

pm2 start ecosystem.json --env production

I got the following in ~/.pm2/pm2.log

2015-07-09 00:32:21: Starting execution sequence in -cluster mode- for app name:crawl-pull id:12
2015-07-09 00:32:21: [PM2] Error caught by domain:
AssertionError: false == true
    at SharedHandle.add (cluster.js:97:3)
    at queryServer (cluster.js:480:12)
    at Worker.onmessage (cluster.js:438:7)
    at ChildProcess.<anonymous> (cluster.js:692:8)
    at ChildProcess.emit (events.js:129:20)
    at handleMessage (child_process.js:324:10)
    at Pipe.channel.onread (child_process.js:352:11)

@parisholley
Copy link

looks like my issue is related to #1204 , bah...

@Unitech Unitech reopened this Jul 11, 2015
This was referenced Jul 11, 2015
@jhansen-tt
Copy link

I think this issue needs to be broken out into two:

  • Obviously node.js is segfaulting, so that is probably node.js's issue, and not necessarily a PM2 bug. I've updated to node.js 0.12.7, and I'm trying to reproduce this again.
  • When PM2 crashes in this way, the dump.pm2 file get emptied (to an empty object "{}"), and "pm2 resurrect" doesn't re-initialize my saved tasks the next time I start pm2. As a work-around, I've made dump.pm2 owned by root, but PM2 should be fixed so that a crash does not cause the dump.pm2 file to get saved as an empty object. This can easily be reproduced by simply kill -9'ing the pm2 process.

@oscar608
Copy link

PM2 stop frequently and stop the app #1433
----pm2 v0.14.3 have resolved this problem?

@membrive
Copy link

No, the problem is not solved in v0.14.3.

@jorge-d
Copy link
Contributor

jorge-d commented Jul 16, 2015

Can you try using the development branch ?

@membrive
Copy link

@jorge-d I will try when I have some free time, I hope this weekend. Thank you for the work.

@soyuka
Copy link
Collaborator

soyuka commented Jul 17, 2015

@membrive Could you try to remove those two lines from Satan:

  node_args.push('--expose-gc'); // Allows manual GC in the code
  node_args.push('--gc-global'); // Does full GC (smaller memory footprint)

And see if the segfault still occurs?

@membrive
Copy link

In the first 24 hours with the development PM2 (with the changes made by @jorge-d) I have not seen segafults, but I am trying in a virtual machine without traffic, so I think that we need more testing.

Should I try with the development PM2 (@jorge-d changes) or with @soyuka changes? Or both?

@klinquist
Copy link

I'm waiting with bated breath too - this is causing me a lot of headaches.

@membrive
Copy link

After 2 days, no crashes with the development branch...

@klinquist
Copy link

I updated several servers that had segfaults on fri night, I can also report no crashes yet..

@soyuka
Copy link
Collaborator

soyuka commented Jul 22, 2015

@klinquist @membrive Still working? We'll publish it soon, we just need some more confirmations!

@klinquist
Copy link

Yes still working! Not a single segfault on the 4 servers I updated.

@Unitech
Copy link
Owner

Unitech commented Jul 22, 2015

AWESOME :) merging to master and publishing
Thanks so much for reporting

@membrive
Copy link

I'm not running the development PM2 in production (with load), but I'm running it in four virtual machines (without load), and there is no segfaults in 4 days! I think that this feedback is not decisive but just for your information.

@Unitech
Copy link
Owner

Unitech commented Jul 22, 2015

PM2 0.14.4 with GC force removed published!

Please upgrade via (it keep your current process list):

$ npm install pm2 -g
$ pm2 update

@fernandoneto
Copy link

im getting the same problem.
kern.log

kernel: [677735.752265] PM2 v0.14.2: Go[1315]: segfault at 4ffffffa0 ip 000000000099e0d8 sp 00007fff5e747af0 error 4 in node[400000+fca000]

pm2.log

2015-07-28 17:05:47: Stopping app:example2 id:_old_0

(null):0
(null)

RangeError: Maximum call stack size exceeded
2015-07-28 17:05:47: Process with pid 29318 still not killed, retrying...
2015-07-28 17:05:47: Process with pid 29318 killed
2015-07-28 17:05:47: Starting execution sequence in -cluster mode- for app name:example id:1
2015-07-28 17:05:47: App name:example id:1 online
2015-07-28 17:05:50: Stopping app:example id:_old_1
2015-07-28 17:05:50: Process with pid 29343 killed
2015-07-28 17:05:50: Starting execution sequence in -cluster mode- for app name:example id:2
2015-07-28 17:05:50: App name:example id:2 online
2015-07-28 17:05:53: Stopping app:example id:_old_2
2015-07-28 17:05:53: Process with pid 29368 killed
2015-07-30 09:25:58: [PM2][WORKER] Started with refreshing interval: 30000
2015-07-30 09:25:58: [[[[ PM2/God daemon launched ]]]]
2015-07-30 09:25:58: BUS system [READY] on port /home/user/.pm2/pub.sock
2015-07-30 09:25:58: RPC interface [READY] on port /home/user/.pm2/rpc.sock
2015-07-30 09:30:08: Starting execution sequence in -cluster mode- for app name:example2 id:0
2015-07-30 09:30:08: App name:example2 id:0 online
2015-07-30 09:30:08: Starting execution sequence in -cluster mode- for app name:example id:1
2015-07-30 09:30:08: App name:example id:1 online
2015-07-30 09:30:08: Starting execution sequence in -cluster mode- for app name:example id:2
2015-07-30 09:30:08: App name:example id:2 online

@jshkurti
Copy link
Contributor

That's because you're using PM2 v0.14.2.
Please upgrade to 0.14.5 :)

@fernandoneto
Copy link

@jshkurti already updated thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests