Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy use of TLSSocket + tls.connect crashes with SIGSEGV/SIGABRT #17475

Closed
pimterry opened this issue Dec 5, 2017 · 7 comments
Closed

Heavy use of TLSSocket + tls.connect crashes with SIGSEGV/SIGABRT #17475

pimterry opened this issue Dec 5, 2017 · 7 comments
Labels
confirmed-bug Issues with confirmed bugs. tls Issues and PRs related to the tls subsystem.

Comments

@pimterry
Copy link
Member

pimterry commented Dec 5, 2017

I've filed a repo with a full repro and details here: https://github.com/pimterry/node-tls-crash.

The specific code that's crashing is https://github.com/pimterry/node-tls-crash/blob/master/proxy.js

To summarize:

  • The repro is a minimal HTTPS-intercepting proxy: it uses new TLSSocket to handle incoming HTTP CONNECT sockets, uses tls.connect to create upstream connections, and pipes between the two.
  • Node aborts in under a minute with any serious web use (e.g. opening https://cnn.com in a browser a few times), with one of a variety of pointer errors, seemingly always in CRYPTO_free.
  • I can reproduce this in v8.9.1, v6.12.0 and v9.2.0
  • There's no native modules used here at all
  • I've attached two example core dumps as releases on the repro repo: https://github.com/pimterry/node-tls-crash/releases

I've pulled this out of a larger project, and tried to shrink the repro down as much as possible. It's pretty small and standalone, but still not tiny tiny, as I haven't found a way to reproduce this without a real working browser session. Happy to shrink it further if you have any suggestions for doing so.

@addaleax addaleax added tls Issues and PRs related to the tls subsystem. confirmed-bug Issues with confirmed bugs. labels Dec 5, 2017
@addaleax
Copy link
Member

addaleax commented Dec 5, 2017

Thanks, this is all around a very nice bug report. I could reproduce with your repo!

@addaleax
Copy link
Member

addaleax commented Dec 6, 2017

Debugging this seems pretty hard but here’s some hint at the root cause:

valgrind + node debug build output
==16891== Invalid write of size 8
==16891==    at 0x23CAF88: node::TLSWrap::clear_stream() (tls_wrap.h:80)
==16891==    by 0x23C9E05: node::TLSWrap::OnDestructImpl(void*) (tls_wrap.cc:700)
==16891==    by 0x228883D: node::StreamResource::~StreamResource() (stream_base.h:166)
==16891==    by 0x2288A39: node::StreamBase::~StreamBase() (stream_base.h:264)
==16891==    by 0x23C7E44: node::TLSWrap::~TLSWrap() (tls_wrap.cc:94)
==16891==    by 0x23C7E95: node::TLSWrap::~TLSWrap() (tls_wrap.cc:106)
==16891==    by 0x23CCBE9: std::default_delete<node::TLSWrap>::operator()(node::TLSWrap*) const (unique_ptr.h:76)
==16891==    by 0x23CCAEC: std::unique_ptr<node::TLSWrap, std::default_delete<node::TLSWrap> >::~unique_ptr() (unique_ptr.h:239)
==16891==    by 0x23CBAEA: void node::BaseObject::WeakCallback<node::TLSWrap>(v8::WeakCallbackInfo<node::TLSWrap> const&) (base_object-inl.h:68)
==16891==    by 0x1C7F616: v8::internal::GlobalHandles::PendingPhantomCallback::Invoke(v8::internal::Isolate*) (global-handles.cc:853)
==16891==    by 0x1C7F3B3: v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) (global-handles.cc:818)
==16891==    by 0x1C7F71E: v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector, v8::GCCallbackFlags) (global-handles.cc:874)
==16891==  Address 0x9d5c388 is 344 bytes inside a block of size 464 free'd
==16891==    at 0x4C2F24B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16891==    by 0x23C7EA1: node::TLSWrap::~TLSWrap() (tls_wrap.cc:106)
==16891==    by 0x23CCBE9: std::default_delete<node::TLSWrap>::operator()(node::TLSWrap*) const (unique_ptr.h:76)
==16891==    by 0x23CCAEC: std::unique_ptr<node::TLSWrap, std::default_delete<node::TLSWrap> >::~unique_ptr() (unique_ptr.h:239)
==16891==    by 0x23CBAEA: void node::BaseObject::WeakCallback<node::TLSWrap>(v8::WeakCallbackInfo<node::TLSWrap> const&) (base_object-inl.h:68)
==16891==    by 0x1C7F616: v8::internal::GlobalHandles::PendingPhantomCallback::Invoke(v8::internal::Isolate*) (global-handles.cc:853)
==16891==    by 0x1C7F3B3: v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) (global-handles.cc:818)
==16891==    by 0x1C7F71E: v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector, v8::GCCallbackFlags) (global-handles.cc:874)
==16891==    by 0x1CAAC8E: v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) (heap.cc:1549)
==16891==    by 0x1CA95B5: v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) (heap.cc:1169)
==16891==    by 0x1C43BEF: v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) (factory.cc:90)
==16891==    by 0x2053B3D: v8::internal::__RT_impl_Runtime_AllocateInNewSpace(v8::internal::Arguments, v8::internal::Isolate*) (runtime-internal.cc:322)
==16891==  Block was alloc'd at
==16891==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16891==    by 0x23C84A4: node::TLSWrap::Wrap(v8::FunctionCallbackInfo<v8::Value> const&) (tls_wrap.cc:204)
==16891==    by 0x169FAF1: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (api-arguments.cc:25)
==16891==    by 0x175EE34: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (builtins-api.cc:112)
==16891==    by 0x175CE9F: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) (builtins-api.cc:142)
==16891==    by 0x175CC30: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) (in /home/sqrt/src/node/out/Debug/node)
==16891==    by 0x1CFA3F3843C3: ???
==16891==    by 0x1CFA3F4715FF: ???
==16891==    by 0x1CFA3F4715FF: ???
==16891==    by 0x1CFA3F389269: ???
==16891==    by 0x1CFA3F4DF96A: ???
==16891==    by 0x1CFA3F4715FF: ???
==16891== 

@addaleax
Copy link
Member

addaleax commented Dec 6, 2017

disregard me

Edit: This was misguided, I confused this->set_destruct_cb() with stream_->set_destruct_cb().

@pimterry This is a potential fix, it would be cool if you could verify that?

diff --git a/src/tls_wrap.cc b/src/tls_wrap.cc
index 3b899ea12d50..7babba40e24c 100644
--- a/src/tls_wrap.cc
+++ b/src/tls_wrap.cc
@@ -101,6 +101,7 @@ TLSWrap::~TLSWrap() {
 #ifdef SSL_CTRL_SET_TLSEXT_SERVERNAME_CB
   sni_context_.Reset();
 #endif  // SSL_CTRL_SET_TLSEXT_SERVERNAME_CB
+  set_destruct_cb({ nullptr, nullptr });
 }
 
 

(I’m not sure how this would cause this trouble in practice, but calling base class methods from a superclass destructor is 👎 in C++ and this at least greatly reduces the crash frequency.)

Could you try this?

diff --git a/src/tls_wrap.cc b/src/tls_wrap.cc
index 3b899ea12d50..4367420c0239 100644
--- a/src/tls_wrap.cc
+++ b/src/tls_wrap.cc
@@ -101,6 +101,14 @@ TLSWrap::~TLSWrap() {
 #ifdef SSL_CTRL_SET_TLSEXT_SERVERNAME_CB
   sni_context_.Reset();
 #endif  // SSL_CTRL_SET_TLSEXT_SERVERNAME_CB
+  if (stream_ != nullptr) {
+    stream_->set_destruct_cb({ nullptr, nullptr });
+    stream_->set_after_write_cb({ nullptr, nullptr });
+    stream_->set_alloc_cb({ nullptr, nullptr });
+    stream_->set_read_cb({ nullptr, nullptr });
+    stream_->set_destruct_cb({ nullptr, nullptr });
+    stream_->Unconsume();
+  }
 }
 
 

I’m going to open a PR with that in a few minutes, it’s a correct change anyway (I think).

@pimterry
Copy link
Member Author

pimterry commented Dec 6, 2017

@addaleax I've done some testing, it's obviously hard to tell if it totally resolves the issue, but I haven't seen any crashes in 10 mins or so of heavy use, so it's certainly a huge improvement!

Thanks for this. I'll ping here if I see the issue again on this build, but if I don't hit it within a day or two I think this can be considered fixed. Will this get backported to node 6 & 8 too?

addaleax added a commit to addaleax/node that referenced this issue Dec 12, 2017
When the TLS stream is destroyed for whatever reason,
we should unset all callbacks on the underlying transport
stream.

Fixes: nodejs#17475
@odinho
Copy link

odinho commented Dec 19, 2017

Hi @addaleax, thanks for fixing this! We're also hitting it quite a lot, I wonder whether you would be so kind to open your landed commit as a pull request against the v8.x-staging branch as I understand the process is?

@addaleax
Copy link
Member

@odinho The commit applies to v8.x without merge conflicts, so it will get picked up anyway and doesn’t need a separate PR.

I am not sure what the schedule for the next v8.x release is. /cc @nodejs/lts @gibfahn

@gibfahn
Copy link
Member

gibfahn commented Dec 20, 2017

Answered in #17478 (comment) so we can keep the discussion in the PR, @odinho , @pimterry , or anyone else in this thread please feel free to reply there.

MylesBorins pushed a commit that referenced this issue Jan 8, 2018
When the TLS stream is destroyed for whatever reason,
we should unset all callbacks on the underlying transport
stream.

PR-URL: #17478
Fixes: #17475
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Jan Krems <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Jan 22, 2018
When the TLS stream is destroyed for whatever reason,
we should unset all callbacks on the underlying transport
stream.

PR-URL: #17478
Fixes: #17475
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Jan Krems <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Jan 22, 2018
When the TLS stream is destroyed for whatever reason,
we should unset all callbacks on the underlying transport
stream.

PR-URL: #17478
Fixes: #17475
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Jan Krems <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Feb 11, 2018
When the TLS stream is destroyed for whatever reason,
we should unset all callbacks on the underlying transport
stream.

PR-URL: #17478
Fixes: #17475
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Jan Krems <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Feb 12, 2018
When the TLS stream is destroyed for whatever reason,
we should unset all callbacks on the underlying transport
stream.

PR-URL: #17478
Fixes: #17475
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Jan Krems <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Feb 13, 2018
When the TLS stream is destroyed for whatever reason,
we should unset all callbacks on the underlying transport
stream.

PR-URL: #17478
Fixes: #17475
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Jan Krems <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants