Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering segfault #6604

Closed
tylersmalley opened this issue May 5, 2016 · 15 comments
Closed

Encountering segfault #6604

tylersmalley opened this issue May 5, 2016 · 15 comments
Labels
v8 engine Issues and PRs related to the V8 dependency.

Comments

@tylersmalley
Copy link

  • Version: 4.3.2
  • Platform: CentOS 7

We're encountering a segfault when running our application:
kernel: [1500729.221978] node[10053]: segfault at 3f7137666df8 ip 0000000000bcebae sp 00007ffc90283310 error 4 in node[400000+1347000]

This crash randomly occurs every couple days. Not sure if this is associated with the GC or JSON Parser.

*** Error in './node': double free or corruption (!prev): 0x0000000004d823d0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7d1fd)[0x7f30253a61fd]
./node(_ZN2v88internal4Heap20FreeDeadArrayBuffersEb+0xc6)[0xab0126]
./node(_ZN2v88internal20MarkCompactCollector11SweepSpacesEv+0x15f)[0xad8f2f]
./node(_ZN2v88internal20MarkCompactCollector14CollectGarbageEv+0x48)[0xae2c98]
./node(_ZN2v88internal4Heap11MarkCompactEv+0x60)[0xa99550]
./node(_ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x318)[0xab0e68]
./node(_ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorEPKcS4_NS_15GCCallbackFlagsE+0x239)[0xab1409]
./node(_ZN2v88internal4Heap15HandleGCRequestEv+0xa1)[0xab1e11]
./node(_ZN2v88internal10JsonParserILb1EE14ParseJsonValueEv+0x9f)[0x8e2a2f]
./node(_ZN2v88internal10JsonParserILb1EE15ParseJsonObjectEv+0x181)[0x8e1b91]
./node(_ZN2v88internal10JsonParserILb1EE14ParseJsonValueEv+0x158)[0x8e2ae8]
./node(_ZN2v88internal10JsonParserILb1EE15ParseJsonObjectEv+0x181)[0x8e1b91]
./node(_ZN2v88internal10JsonParserILb1EE14ParseJsonValueEv+0x158)[0x8e2ae8]
./node(_ZN2v88internal10JsonParserILb1EE15ParseJsonObjectEv+0x181)[0x8e1b91]
./node(_ZN2v88internal10JsonParserILb1EE14ParseJsonValueEv+0x158)[0x8e2ae8]
./node(_ZN2v88internal10JsonParserILb1EE15ParseJsonObjectEv+0x181)[0x8e1b91]
./node(_ZN2v88internal10JsonParserILb1EE14ParseJsonValueEv+0x158)[0x8e2ae8]
./node(_ZN2v88internal10JsonParserILb1EE9ParseJsonEv+0x41)[0x8e2e01]
./node(_ZN2v88internal17Runtime_ParseJsonEiPPNS0_6ObjectEPNS0_7IsolateE+0x314)[0xc89c14]

And inspecting the core dump:

> ::jsstack
native: v8::internal::ObjectVisitor::VisitCodeEntry+0xe
native: v8::internal::SlotsBuffer::UpdateSlots+0x107
native: v8::internal::MarkCompactCollector::EvacuateNewSpaceAndCandidate...
native: v8::internal::MarkCompactCollector::SweepSpaces+0x153
native: v8::internal::MarkCompactCollector::CollectGarbage+0x48
native: v8::internal::Heap::MarkCompact+0x60
native: v8::internal::Heap::PerformGarbageCollection+0x318
native: v8::internal::Heap::CollectGarbage+0x239
native: v8::internal::Heap::CollectAllAvailableGarbage+0x82
native: v8::internal::Factory::NewJSObject+0xb8
native: v8::internal::JsonParser<1>::ParseJsonObject+0x4c
native: v8::internal::JsonParser<1>::ParseJsonValue+0x158
native: v8::internal::JsonParser<1>::ParseJsonObject+0x181
native: v8::internal::JsonParser<1>::ParseJsonValue+0x158
native: v8::internal::JsonParser<1>::ParseJsonObject+0x181
native: v8::internal::JsonParser<1>::ParseJsonValue+0x158
native: v8::internal::JsonParser<1>::ParseJsonObject+0x181
native: v8::internal::JsonParser<1>::ParseJsonValue+0x158
native: v8::internal::JsonParser<1>::ParseJsonObject+0x181
native: v8::internal::JsonParser<1>::ParseJsonValue+0x158
native: v8::internal::JsonParser<1>::ParseJson+0x41
native: v8::internal::Runtime_ParseJson+0x314
        (1 internal frame elided)
js:     parse
        (1 internal frame elided)
js:     <anonymous> (as Json.deserialize)
js:     respond
js:     checkRespForFailure
js:     <anonymous> (as <anon>)
        (1 internal frame elided)
        (1 internal frame elided)
js:     wrapper
js:     emitNone
js:     emit
js:     endReadableNT
js:     nextTickCallbackWith2Args
js:     _tickDomainCallback
        (1 internal frame elided)
        (1 internal frame elided)
native: v8::internal::Execution::Call+0x14f
native: v8::Function::Call+0xff
native: v8::Function::Call+0x41
native: node::AsyncWrap::MakeCallback+0x22e
native: node::StreamBase::EmitData+0xcc
native: node::TLSWrap::OnReadSelf+0x41
native: node::TLSWrap::ClearOut+0xd7
native: node::TLSWrap::OnReadImpl+0xbe
native: node::StreamWrap::OnRead+0x73
native: uv__read+0x20f
native: uv__stream_io+0x290
native: uv__io_poll+0x3d5
native: uv_run+0x156
native: node::Start+0x438
@addaleax addaleax added the v8 engine Issues and PRs related to the V8 dependency. label May 5, 2016
@MylesBorins
Copy link
Contributor

MylesBorins commented May 5, 2016

@tylersmalley are you still getting the same issues on v4.4.3?

@vkurchatkin
Copy link
Contributor

It can be a bug in native addon, which frees buffer owned by ArrayBuffer

@mscdex
Copy link
Contributor

mscdex commented May 5, 2016

That stack trace looks familiar (the GC parts), I seem to remember it being fixed at some point?

EDIT: Maybe I was wrong. Ref #6301 and maybe #3715?

@tylersmalley
Copy link
Author

@thealphanerd, we have encountered different issues when running on 4.4.3 as our memory usage for the application almost doubles (TLS related). I am working to resolves those and will test.

@mscdex, #6301 appears to be the same issue. The application is running with --max-old-space-size set.

@MylesBorins
Copy link
Contributor

MylesBorins commented May 5, 2016

@tylersmalley I'd love to hear more about the usage problems... specifically if it involves something we landed that could be reverted.

v4.4.4 is about to drop and will be coming with the latest openssl updates, which you will likely want

@ppf2
Copy link

ppf2 commented May 6, 2016

+1 on addressing this ticket (and/or #6301). Seeing the node process crash in production every week is concerning.

@mscdex
Copy link
Contributor

mscdex commented May 6, 2016

@tylersmalley @ppf2 Is it possible you could test with node v6.1.0 and let us know if you still encounter the segfault there? Assuming this is a v8 issue, that might help to narrow it down a little bit.

@tylersmalley
Copy link
Author

@mscdex, I am working to get out app to run on v6.1.0 - but currently it does not. This is primarily due to graceful-fs being unsupported until v4. I am updating all the dependencies which rely on this package.

@bnoordhuis
Copy link
Member

@tylersmalley Anything to report? If not, can we close this issue?

@tylersmalley
Copy link
Author

Hey @bnoordhuis, we just recently were able to push forward with the changes required to migrate to Node 6 and is my current focus. We should be able to begin testing next week.

@tylersmalley
Copy link
Author

@bnoordhuis - Bumping to Node 6.3 resolved this issue for us.

@bnoordhuis
Copy link
Member

Thanks for the update, I'll close out the issue.

If anyone has ideas on how to reproduce (or better yet: fix) the issue, please shout.

@jorangreef
Copy link
Contributor

@bnoordhuis,

We are also getting a segfault with large values of max-old-space-size plus a large heap.

We use max-old-space-size=32768 for a heap of 32GB and as soon as we grow the heap the process crashes.

This would work fine with Node v0.10.45 but segfaults with Node v7.0.0 and interim versions.

I noticed your integer overflow fix to V8's heap.cc:

max_old_generation_size_ = static_cast<intptr_t>(max_old_space_size) * MB;

Could this have something to do with it? Or is there still an integer overflow somewhere else? Is max_old_generation_size_ too small to hold 32768*1024*1024?

@jorangreef
Copy link
Contributor

Here is a very small script to reproduce it: https://gist.github.com/jorangreef/ccf6f4dabc87caa1889dbf2fca2df031

We are running with a modified heap.cc, where CollectAllGarbage() and CollectAllAvailableGarbage() are empty functions. This was advised by Vyacheslav Egorov a few years ago because we have millions of persistent objects and these were triggering individual GC pauses of a minute or more.

Running with this modified node and using node test.js and no max-old-space-size is enough to get a segfault.

Could it be something to do with incremental GC now being more involved and perhaps a double-free, where this would not usually happen if the CollectAll... functions were functional?

@bnoordhuis
Copy link
Member

We are running with a modified heap.cc, where CollectAllGarbage() and CollectAllAvailableGarbage() are empty functions.

That may have been fine four years ago but it sounds like a recipe for disaster with the current garbage collector. If you run into issues with an unpatched node binary, I'm happy to look into them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

8 participants