-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEGFAULT in node 4.8.4 on linux #14228
Comments
I'm also trying to get a core dump to dig a bit deeper. |
Can you try this? $ gdb /path/to/node
> info symbol 0xaed6c1 |
I've also got a couple of core dumps now that we collected over night:
|
What does |
For the first dump:
|
Thanks. It's failing on the IsUndefined() or IsTheHole() check on this line in objects.cc (warning: big file), presumably Just to be sure, what does |
Yep, that address isn't a thing:
P.S.: Thanks a lot for the help in looking into this! :) We've seen this issue in multiple applications now and - afaik - they don't share any native modules. But I still don't have any way of reproducing it reliably (other than running a bunch of instances and waiting 1-2 hours). |
No problem, happy to help. Can you post the output of |
core.21565
|
core.26514
|
core.26538
|
core.26551
|
Looks like they are all the same issue: pointers that look like valid heap object pointers but aren't. Something puts the heap in an inconsistent state but hard to say what. |
I was afraid that was the answer. Is there any value in rolling out a debug build of node to some canary hosts or is it unlikely to provide more insights? Otherwise I could try to at least narrow it down a bit (somewhere in |
Sounds like a similar issue has been reported for v4.8.0: #11606 Commonality: Both use express. But that's hardly saying anything for node web apps. |
We get such bug reports frequently but nine times out of ten it's caused by a native add-on. A debug build would help, those usually catch bugs closer to the source. |
Just checked - the app(s) in question don't contain any We'll roll out a debug build and see what we get from that. |
core.9407 (v4.8.4debug)
|
core.9893 (v4.8.4debug)
|
Added some info extracted from debug build core dumps. Let me know if you'd want me to look up anything else in there. It took some time getting a working custom build into our infrastructure so I'm only ~90% sure it will perfectly reproduce the issue (e.g. it links against different c++ runtime library versions). But the stack looks fairly familiar ( |
@jkrems would you be able to bisect by testing some of the other Semver-Minor releases and see if we can narrow this down a bit |
@MylesBorins Will definitely try. The problem is mostly that I don't have good/fast test cases. My only way to reproduce right now is "roll out to canary hosts, wait hours, see what happens". Which is rather time consuming...
|
Alright, 5 hours in and v4.6.1 doesn't show any segfaults. So afaict the issue starts already in v4.6.2. I will keep monitoring for the rest of the day but so far this is where I'm at:
Note: We're currently working around this by more aggressively pushing teams to adopt node 6. But I'll try to keep some node 4 running somewhere until we either give up or figured out what's going on here. |
One of the devs on that first lucky service ( |
That sounds very plausible. @matthewloring @ofrobots Ideas? This seems to have been introduced by #7689. |
@jkrems as you able to test with Node 6.x at all? It would be quite useful to know if the crash exists there or not. /cc @mlippautz @hannespayer any ideas about the crashes in GC and whether the back-ported bug fix might have some issues on V8 4.5? |
We rolled out the same stack on node 6 (by now including the very service where we saw these crashes initially). None of the services on node 6 (most on 6.11.1 now) are seeing segfaults. |
The Meteor project is running into this same exact problem on Node.js 4.8.4 though I've bisected it down to the exact commit that @jkrems had identified above (2d07fd7), which landed in 4.6.2. Though it's not 100% reproducible, it's a relatively easy numbers game and happens at least once out of every 4 runs in our CI. If you check this CircleCI build history you can see me toggling back and forth between Node 4.6.1 and Node 4.6.2 and the problem comes and goes (each entry on that page is representative of four containers running the same build, but at least one of the four containers fails each time thus failing the entire build). I can't put my finger on exactly what we're doing which is pronouncing the problem, but it's seemingly a garbage collection bug and thus prone to varied behavior. Looking a bit deeper, the commit that introduced the issue (again, 2d07fd7) was intended to be a backport of v8/v8@e093a04. However, that commit was reverted (automatically, I think?) via v8/v8@5f5a328 because " @jeisinger, you haven't been looped into this yet, but as the original PR author and someone who has worked on this code a fair bit, any ideas? |
For what it's worth, when branched off the Same success story if I apply the same order of changes onto Thoughts? |
@abernix could you please submit a PR that reverts 2d07fd7 and applies abernix/node@7841772 (please follow the commit guidelines for V8 changes) we can get that reviewed and landed and cut another 4.x |
Original commit messages: v8/v8@09db540 Reland of Rehash and clear deleted entries in weak collections during GC BUG=v8:4909 [email protected],[email protected] LOG=n Review URL: https://codereview.chromium.org/1890123002 Cr-Commit-Position: refs/heads/master@{nodejs#35538} v8/v8@686558d Fix comment about when we rehash ObjectHashTables before growing them [email protected] BUG= Review-Url: https://codereview.chromium.org/1918403003 Cr-Commit-Position: refs/heads/master@{nodejs#35853} Refs: https://crbug.com/v8/4909 Refs: nodejs#6180 Refs: nodejs#7689 Refs: nodejs#6398 Fixes: nodejs#14228
@MylesBorins Done: #14829. Note that the PR deviated slightly from my previous suggestion to apply abernix@7841772, but I think I've explained why (see Ultimately, the Either way, I've verified that the original repro/issue in #6180 is still fixed. |
Original commit messages: v8/v8@09db540 Reland of Rehash and clear deleted entries in weak collections during GC BUG=v8:4909 [email protected],[email protected] LOG=n Review URL: https://codereview.chromium.org/1890123002 Cr-Commit-Position: refs/heads/master@{#35538} v8/v8@686558d Fix comment about when we rehash ObjectHashTables before growing them [email protected] BUG= Review-Url: https://codereview.chromium.org/1918403003 Cr-Commit-Position: refs/heads/master@{#35853} Refs: https://crbug.com/v8/4909 Refs: #6180 Refs: #7689 Refs: #6398 Fixes: #14228 PR-URL: #14829 Reviewed-By: Ben Noordhuis <[email protected]>
Fixed by #14829. |
Original commit messages: v8/v8@09db540 Reland of Rehash and clear deleted entries in weak collections during GC BUG=v8:4909 [email protected],[email protected] LOG=n Review URL: https://codereview.chromium.org/1890123002 Cr-Commit-Position: refs/heads/master@{#35538} v8/v8@686558d Fix comment about when we rehash ObjectHashTables before growing them [email protected] BUG= Review-Url: https://codereview.chromium.org/1918403003 Cr-Commit-Position: refs/heads/master@{#35853} Refs: https://crbug.com/v8/4909 Refs: #6180 Refs: #7689 Refs: #6398 Fixes: #14228 PR-URL: #14829 Reviewed-By: Ben Noordhuis <[email protected]>
Original commit messages: v8/v8@09db540 Reland of Rehash and clear deleted entries in weak collections during GC BUG=v8:4909 [email protected],[email protected] LOG=n Review URL: https://codereview.chromium.org/1890123002 Cr-Commit-Position: refs/heads/master@{#35538} v8/v8@686558d Fix comment about when we rehash ObjectHashTables before growing them [email protected] BUG= Review-Url: https://codereview.chromium.org/1918403003 Cr-Commit-Position: refs/heads/master@{#35853} Refs: https://crbug.com/v8/4909 Refs: nodejs/node#6180 Refs: nodejs/node#7689 Refs: nodejs/node#6398 Fixes: nodejs/node#14228 PR-URL: nodejs/node#14829 Reviewed-By: Ben Noordhuis <[email protected]>
Linux <hostname> 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Opening this early but so far I have neither a good reproduction nor any hint of the root cause.
After trying to roll out node v4.8.4 on some of our hosts, we are seeing occasional process crashes:
This happened using the official binary distribution (https://nodejs.org/dist/v4.8.4/v4.8.4-linux-x64.tar.gz).
This didn't happen with node v4.6.1 (a rollback to this version made the crashes go away).
The text was updated successfully, but these errors were encountered: