Segfault in node::StreamBase::GetFD #2847

ctizen · 2015-09-13T17:39:59Z

Hello,

While performing the load testing on my application I've noticed a segfault. My environment and conditions are:

VPS under OpenVZ, Ubuntu 14.04
Node v4.0.0
Using PM2 v0.14.7 (cluster mode, 4 processes in cluster)
Big application (about 1000+ files, total codebase size ~2MB without node_modules)
All node_modules are built on current version of Node
At pm2's config I've set the memory limit of 500 MB per process.

So, under load (about 50-100 rps) when some process runs out of memory, it's killed and restarted by PM2, but sometimes after that the parent PM2 process just fails with "Segmentation fault".

I've researched that for a while (compiled debug version, ran under gdb, etc) and found the source of segfault in src/stream_base-inl.h:

template <class Base>
void StreamBase::GetFD(Local<String> key,
                       const PropertyCallbackInfo<Value>& args) {
  StreamBase* wrap = Unwrap<Base>(args.Holder());

  if (!wrap->IsAlive()) // SEGFAULT. 
    return args.GetReturnValue().Set(UV_EINVAL);

  args.GetReturnValue().Set(wrap->GetFD());
}

I assumed that somehow the wrap pointer becomes NULL. I didn't research deeper, so I assumed that Unwrap function failed to do what it's needed for and just returned NULL, but the code of GetFD didn't handle this correctly. So I tried this small fix:

-  if (!wrap->IsAlive()) 
+  if (!wrap || !wrap->IsAlive())
     return args.GetReturnValue().Set(UV_EINVAL);

And it seems to work so far - no single failure for an hour of load testing, and I'm going to leave it working for some days to be sure.

Does this solution seem to be correct? Should I make a PR for it?

The text was updated successfully, but these errors were encountered:

mscdex · 2015-09-13T18:35:47Z

/cc @indutny

indutny · 2015-09-13T21:36:14Z

@heilage-nsk may I ask you to publish a backtrace(s) of the crash? This is pretty odd to see it NULL there. Thank you!

indutny · 2015-09-13T21:36:39Z

Oh, one more thing. Is there any way to reproduce it?

ctizen · 2015-09-14T08:27:47Z

@indutny
Here is what I managed to get from gdb:

Program received signal SIGSEGV, Segmentation fault.
0x000000000152cbdb in void node::StreamBase::GetFD<node::StreamWrap>(v8::Local<v8::String>, v8::PropertyCallbackInfo<v8::Value> const&) () at ../src/stream_base-inl.h:75
75    if (!wrap->IsAlive())
   0x000000000152cbd7 <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+57>:   48 8b 45 e8 mov    rax,QWORD PTR [rbp-0x18]
=> 0x000000000152cbdb <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+61>:   48 8b 00    mov    rax,QWORD PTR [rax]
   0x000000000152cbde <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+64>:   48 83 c0 40 add    rax,0x40
   0x000000000152cbe2 <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+68>:   48 8b 00    mov    rax,QWORD PTR [rax]
   0x000000000152cbe5 <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+71>:   48 8b 55 e8 mov    rdx,QWORD PTR [rbp-0x18]
   0x000000000152cbe9 <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+75>:   48 89 d7    mov    rdi,rdx
   0x000000000152cbec <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+78>:   ff d0   call   rax
   0x000000000152cbee <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+80>:   83 f0 01    xor    eax,0x1
   0x000000000152cbf1 <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+83>:   84 c0   test   al,al
   0x000000000152cbf3 <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+85>:   74 23   je     0x152cc18 <_ZN4node10StreamBase5GetFDINS_10StreamWrapEEEvN2v85LocalINS3_6StringEEERKNS3_20PropertyCallbackInfoINS3_5ValueEEE+122>
(gdb) bt
#0  0x000000000152cbdb in void node::StreamBase::GetFD<node::StreamWrap>(v8::Local<v8::String>, v8::PropertyCallbackInfo<v8::Value> const&) () at ../src/stream_base-inl.h:75
#1  0x0000237dfc3569b6 in ?? ()
#2  0x00007fffffffd7b0 in ?? ()
#3  0x0000237dfc3568c1 in ?? ()
#4  0x00007fffffffd780 in ?? ()
#5  0x00007fffffffd858 in ?? ()
#6  0x0000237dfc51899b in ?? ()
#7  0x00001ff6c0b9e919 in ?? ()
#8  0x00001487c81e50c9 in ?? ()
#9  0x0000000001e16df0 in ?? ()
#10 0x00001ff6c0b04131 in ?? ()
#11 0x00001ff6c0b04131 in ?? ()
#12 0x00001ff6c0baa171 in ?? ()
#13 0x00001487c81e50c9 in ?? ()
#14 0x0000150e070c3f01 in ?? ()
#15 0x00000275e774d399 in ?? ()
#16 0x0000000100000000 in ?? ()
#17 0x0000001600000000 in ?? ()
#18 0x0000150e070d6551 in ?? ()
#19 0x0000000100000000 in ?? ()
#20 0x00001487c81e50c9 in ?? ()
#21 0x00001ff6c0b9e919 in ?? ()
#22 0x00001ff6c0b04131 in ?? ()
#23 0x00001ff6c0b04131 in ?? ()
#24 0x00001ff6c0b04131 in ?? ()
#25 0x0000150e070d6509 in ?? ()
#26 0x00001ff6c0b9e919 in ?? ()
#27 0x00001487c8129839 in ?? ()
#28 0x00001487c8121bd9 in ?? ()
#29 0x00007fffffffd908 in ?? ()
#30 0x0000237dfc518a58 in ?? ()
#31 0x00001487c81e50c9 in ?? ()
#32 0x0000150e070c3f01 in ?? ()
#33 0x00001ff6c0b04131 in ?? ()
#34 0x00001ff6c0b04131 in ?? ()
#35 0x00001487c8129839 in ?? ()
#36 0x00001ff6c0b9ed89 in ?? ()
#37 0x0000150e070d5de9 in ?? ()
#38 0x0000001200000000 in ?? ()
#39 0x0000002600000000 in ?? ()
#40 0x0000150e070d5f21 in ?? ()
#41 0x0000000100000000 in ?? ()
#42 0x0000186f62165c31 in ?? ()
#43 0x00001ff6c0b9ed89 in ?? ()
#44 0x00001ff6c0b04131 in ?? ()
#45 0x00001ff6c0b04131 in ?? ()
#46 0x00001ff6c0b04131 in ?? ()
#47 0x0000150e070d5de9 in ?? ()
#48 0x00001ff6c0b9ed89 in ?? ()
#49 0x00001487c8129839 in ?? ()
#50 0x00001487c8121bd9 in ?? ()
#51 0x00007fffffffd9b8 in ?? ()
#52 0x0000237dfc518a58 in ?? ()
#53 0x0000186f62165c31 in ?? ()
#54 0x0000150e070c3f01 in ?? ()
#55 0x00001ff6c0b04131 in ?? ()
#56 0x00001ff6c0b04131 in ?? ()
#57 0x00001487c8129839 in ?? ()
#58 0x00001ff6c0b9d2d1 in ?? ()
#59 0x0000150e070d59f9 in ?? ()
#60 0x0000000700000000 in ?? ()
#61 0x0000001a00000000 in ?? ()
#62 0x0000150e070d5bb1 in ?? ()
#63 0x0000000100000000 in ?? ()
#64 0x0000186f6210fdf9 in ?? ()
#65 0x00001ff6c0b9d2d1 in ?? ()
#66 0x00001ff6c0b04131 in ?? ()
#67 0x00001ff6c0b04131 in ?? ()
#68 0x00001ff6c0b04131 in ?? ()
#69 0x0000150e070d59f9 in ?? ()
#70 0x00001ff6c0b9d2d1 in ?? ()
#71 0x00001487c8129839 in ?? ()
#72 0x00001487c8121bd9 in ?? ()
#73 0x00007fffffffda68 in ?? ()
#74 0x0000237dfc518a58 in ?? ()
#75 0x0000186f6210fdf9 in ?? ()
#76 0x0000150e070c3f01 in ?? ()
#77 0x00001ff6c0b04131 in ?? ()
#78 0x00001ff6c0b04131 in ?? ()
#79 0x00001487c8129839 in ?? ()
#80 0x0000117e4316c169 in ?? ()
#81 0x0000150e070c3fb9 in ?? ()
#82 0x0000000400000000 in ?? ()
#83 0x0000000500000000 in ?? ()
#84 0x0000150e070c4081 in ?? ()
#85 0x0000000100000000 in ?? ()
#86 0x0000186f621f5a89 in ?? ()
#87 0x0000117e4316c169 in ?? ()
#88 0x00001ff6c0b04131 in ?? ()
#89 0x00001ff6c0b04131 in ?? ()
#90 0x00001ff6c0b04131 in ?? ()
#91 0x0000150e070c3fb9 in ?? ()
#92 0x0000117e4316c169 in ?? ()
#93 0x00001487c8129839 in ?? ()
#94 0x00001487c8121bd9 in ?? ()
#95 0x00007fffffffdac0 in ?? ()
#96 0x0000237dfc62f382 in ?? ()
#97 0x0000186f621f5a89 in ?? ()
#98 0x0000150e070c3f01 in ?? ()
#99 0x00001ff6c0b04131 in ?? ()
#100 0x00001ff6c0b04131 in ?? ()
#101 0x00001ff6c0b04131 in ?? ()
#102 0x00001487c8112089 in ?? ()
#103 0x000015c25d9ffb49 in ?? ()
#104 0x000015c25d9ffc91 in ?? ()
#105 0x000015c25d9ffb49 in ?? ()
#106 0x00007fffffffdaf8 in ?? ()
#107 0x0000237dfc590f25 in ?? ()
#108 0x0000186f621f5a89 in ?? ()
#109 0x000034620f0c74a1 in ?? ()
#110 0x000015c25d9ffc91 in ?? ()
#111 0x00003fc95e581dc9 in ?? ()
#112 0x0000152a146f69f1 in ?? ()
#113 0x00007fffffffdb58 in ?? ()
#114 0x0000237dfc5908f1 in ?? ()
#115 0x0000186f621f5a89 in ?? ()
#116 0x000015c25d9ffa69 in ?? ()
#117 0x00003fc95e581dc9 in ?? ()
#118 0x000034620f08ec39 in ?? ()
#119 0x00001ff6c0b04131 in ?? ()
#120 0x00001ff6c0b04131 in ?? ()
#121 0x00001ff6c0b04131 in ?? ()
#122 0x00001ff6c0b04131 in ?? ()
#123 0x000034620f08ec39 in ?? ()
#124 0x0000186f621bfbe9 in ?? ()
#125 0x00007fffffffdba0 in ?? ()
#126 0x0000237dfc6b8819 in ?? ()
#127 0x0000186f621ea3a9 in ?? ()
#128 0x000034620f08ec39 in ?? ()
#129 0x00001ff6c0b04111 in ?? ()
#130 0x00001ff6c0b04131 in ?? ()
#131 0x0000150e070ab421 in ?? ()
#132 0x0000150e070b5599 in ?? ()
#133 0x0000150e070b5561 in ?? ()
#134 0x00007fffffffdbd8 in ?? ()
#135 0x0000237dfc3194b7 in ?? ()
#136 0x00001ff6c0b04111 in ?? ()
#137 0x00001ff6c0b04131 in ?? ()
#138 0x0000000200000000 in ?? ()
#139 0x0000150e070b5599 in ?? ()
#140 0x0000000a00000000 in ?? ()
#141 0x00007fffffffdc18 in ?? ()
#142 0x0000237dfc6b6e0c in ?? ()
#143 0x00001ff6c0b043f1 in ?? ()
#144 0x00001ff6c0b04111 in ?? ()
#145 0x00001ff6c0b04131 in ?? ()
#146 0x0000150e070b5599 in ?? ()
#147 0x0000150e070b5679 in ?? ()
#148 0x0000150e070b5621 in ?? ()
#149 0x00007fffffffdc48 in ?? ()
#150 0x0000237dfc635874 in ?? ()
#151 0x0000150e070b5789 in ?? ()
#152 0x00001ff6c0b975c1 in ?? ()
#153 0x0000150e070b5741 in ?? ()
#154 0x0000150e070b56c1 in ?? ()
#155 0x00007fffffffdcc0 in ?? ()
#156 0x0000237dfc4ad281 in ?? ()
#157 0x0000150e070b5789 in ?? ()
#158 0x0000150e070b5741 in ?? ()
#159 0x00003a4c496a9a31 in ?? ()
#160 0x00001ff6c0b04131 in ?? ()
#161 0x0000150e07004311 in ?? ()
#162 0x00001ff6c0b043f1 in ?? ()
#163 0x0000150e070b5789 in ?? ()
#164 0x0000006500000000 in ?? ()
#165 0x000728b700000000 in ?? ()
#166 0x0000150e070b5941 in ?? ()
#167 0x0000006400000000 in ?? ()
#168 0x00003a4c496fc051 in ?? ()
#169 0x00003a4c496a9a31 in ?? ()
#170 0x00007fffffffdcf8 in ?? ()
#171 0x0000237dfc31a09d in ?? ()
#172 0x0000150e070b5941 in ?? ()
#173 0x00003a4c496fc051 in ?? ()
#174 0x0000237dfc319fa1 in ?? ()
#175 0x0000000800000000 in ?? ()
#176 0x0000000000000000 in ?? ()
(gdb) gcore core.dump
Couldn't get registers: No such process.
(gdb) info threads
  Id   Target Id         Frame 
  10   Thread 0x7fffe61fc700 (LWP 1965) "node" (running)
  9    Thread 0x7fffe6bfd700 (LWP 1964) "node" (running)
  8    Thread 0x7fffe75fe700 (LWP 1963) "node" (running)
  7    Thread 0x7fffe7fff700 (LWP 1962) "node" (running)
  5    Thread 0x7ffff4d5f700 (LWP 1937) "V8 WorkerThread" (running)
  4    Thread 0x7ffff5760700 (LWP 1936) "V8 WorkerThread" (running)
  3    Thread 0x7ffff6161700 (LWP 1935) "V8 WorkerThread" (running)
  2    Thread 0x7ffff6b62700 (LWP 1934) "V8 WorkerThread" (running)
* 1    Thread 0x7ffff7fee740 (LWP 1931) "node" 0x000000000152cbdb in void node::StreamBase::GetFD<node::StreamWrap>(v8::Local<v8::String>, v8::PropertyCallbackInfo<v8::Value> const&) () at ../src/stream_base-inl.h:75

Almost no useful info from backtrace, and also I couldn't generate core dump :( What else can I do to gather more info?

I'm not sure if one can reproduce this segfault easily, but I'll try to create a test case.

ctizen · 2015-09-14T08:30:27Z

Oh. And my naive fix didn't help :) Seems like it's not that easy.

indutny · 2015-09-14T09:20:14Z

@heilage-nsk thank you! How did you collect this information? It looks like a debug mode build, but... the stack trace has lots of gaps. It can't just get straight from JS to the C++ code, there should be several v8 API C++ helpers on stack.

A couple more questions:

How have you built the debug-mode node?
Have you tried to turn on core dumps with ulimit -c unlimited?

Thanks!

ctizen · 2015-09-14T09:33:55Z

@indutny
I compiled and ran node/pm2 as follows:

./configure --prefix=/usr --gdb --debug
make
gdb --args out/Debug/node /usr/bin/pm2 /var/www/project-startup.json --no-daemon

I tried to do ulimit -c unlimited with no success. Since I already had some debugging problems on this machine, I think I should try it on some other VM and/or gdb version (currently 7.4, pretty old). I will inform when I get some results.

indutny · 2015-09-14T18:54:28Z

@heilage-nsk thank you very much for this. I appreciate your help.

ctizen · 2015-09-16T12:13:20Z

@indutny
I just tried to reproduce this on fresh Ubuntu 14.04.2 install with gdb 7.7 - it's still failing, backtrace still shows no useful information, no core dump, but at least I was able to inspect the value of local variable:

Program received signal SIGSEGV, Segmentation fault.
0x000000000154cc09 in node::StreamBase::GetFD<node::StreamWrap> (key=..., args=...) at ../src/stream_base-inl.h:75
75    if (!wrap->IsAlive())
(gdb) info locals
wrap = 0x0

I tried to set a conditional breakpoint before this line, but in case when wrap == 0 the breakpoint did not trigger for some reason. I do not have much experience in debugging native (and especially multithreaded) apps under *nix, so it seems weird to me.

So I reapplied my naive fix and re-ran load testing under gdb, and it still works fine after several hours. For now I'm going to inspect nodejs sources a bit deeper to know what could go wrong here, any ideas are appreciated :)

indutny · 2015-09-17T02:20:47Z

@heilage-nsk I think I have an idea on why it could be happening, working on the test case.

In a few words, handle.close() releases the structure and nulls the internal field on a next tick. Previous code accounted for this properly, but with TLSWrap it may be broken.

Are you using tls/https in your application?

ctizen · 2015-09-17T03:12:31Z

@indutny
Well, we use https, but it's provided by nginx, I don't think that proxy_pass-ing requests to nodejs can invoke its TLS internals. Moreover, the error seems to happen in node::StreamWrap instance of GetFD template function.
Important note: the segfault seems to happen only when a process is being killed by SIGTERM or SIGINT like here: https://github.com/Unitech/PM2/blob/master/lib/God/Methods.js#L236 - in all other cases it works fine.

indutny · 2015-09-17T05:55:09Z

@heilage-nsk no https requests either?

ctizen · 2015-09-17T06:16:50Z

@indutny
Retried with http-only URLs list - still fails (without any fixes).

indutny · 2015-09-21T23:52:50Z

The error is pretty easy to reproduce, but how it could be happening in natural environment is less evident to me at the moment:

var net = require('net');

var socket = net.connect(1443);
var handle = socket._handle;
socket.destroy();
socket.on('close', function() {
  setImmediate(function() {
    console.log('hm..');
    console.log(handle.fd);
  });
});

Investigating

indutny · 2015-09-21T23:54:27Z

Perhaps it is related to cluster module.

indutny · 2015-09-22T00:00:50Z

@heilage-nsk do you know if it happens in master or in worker process?

indutny · 2015-09-22T00:34:54Z

Actually, I have one patch that may reveal some useful information about this crash:

diff --git a/src/stream_base-inl.h b/src/stream_base-inl.h
index 81114a2..114ba39 100644
--- a/src/stream_base-inl.h
+++ b/src/stream_base-inl.h
@@ -71,6 +71,17 @@ template <class Base>
 void StreamBase::GetFD(Local<String> key,
                        const PropertyCallbackInfo<Value>& args) {
   StreamBase* wrap = Unwrap<Base>(args.Holder());
+  if (wrap == nullptr) {
+    Environment* env = Environment::GetCurrent(args.GetIsolate());
+    Local<Object> global = env->context()->Global();
+    Local<Object> console = global->Get(
+        String::NewFromUtf8(env->isolate(), "console")).As<Object>();
+    Local<v8::Function> trace = console->Get(
+        String::NewFromUtf8(env->isolate(), "trace")).As<v8::Function>();
+
+    v8::MaybeLocal<Value> res =
+        trace->Call(env->context(), console, 0, nullptr);
+  }

   if (!wrap->IsAlive())
     return args.GetReturnValue().Set(UV_EINVAL);

@heilage-nsk may I ask you to give it a try? It should just print a stacktrace before crashing on nullptr wrap.

ctizen · 2015-09-22T13:50:05Z

@indutny
Thank you!

Trace has shown this:

2015-09-22 10:50:00: 25515 pid can not be killed { [Error: kill ESRCH] code: 'ESRCH', errno: 'ESRCH', syscall: 'kill' }
2015-09-22 10:50:00: Trace
    at safeDeepClone (/usr/lib/node_modules/pm2/node_modules/safe-clone-deep/src/index.js:50:27)
    at safeDeepClone (/usr/lib/node_modules/pm2/node_modules/safe-clone-deep/src/index.js:53:22)
    at safeDeepClone (/usr/lib/node_modules/pm2/node_modules/safe-clone-deep/src/index.js:53:22)
    at safeDeepClone (/usr/lib/node_modules/pm2/node_modules/safe-clone-deep/src/index.js:53:22)
    at cloneWrap (/usr/lib/node_modules/pm2/node_modules/safe-clone-deep/src/index.js:65:10)
    at Object.module.exports.clone (/usr/lib/node_modules/pm2/lib/Utility.js:39:12)
    at Object.Common.deepCopy.Common.serialize.Common.clone (/usr/lib/node_modules/pm2/lib/Common.js:165:18)
    at Object.getFormatedProcesses (/usr/lib/node_modules/pm2/lib/God/Methods.js:53:21)
    at /usr/lib/node_modules/pm2/lib/God/ActionMethods.js:280:31
    at Object.God.killProcess (/usr/lib/node_modules/pm2/lib/God/Methods.js:247:14)
Segmentation fault (core dumped)

So it also looks like a bug in PM2 process management code. I tried to run the app with just cluster module and it doesn't fail (but has some other bug not related to current issue :) ).

I've made some more debugging and basic tracing and came to v8's ReadField function. In deps/v8/include/v8.h I modified function to this state:

7091   template <typename T>
7092   V8_INLINE static T ReadField(const internal::Object* ptr, int offset) {
7093     const uint8_t* addr =
7094         reinterpret_cast<const uint8_t*>(ptr) + offset - kHeapObjectTag;
7095     const T* retval = reinterpret_cast<const T*>(addr);
7096     if (!*retval) {
7097       std::cout << "Ouch! ReadField got NULL with offset = " << offset << " and kHeap = " << kHeapObjectTag << std::endl;
7098     }
7099     return *retval;
7100   }

The output was:

....
2015-09-22 19:23:00: Process with pid 14620 killed
Ouch! ReadField got NULL with offset = 24 and kHeap = 1
Ouch! ReadField got NULL with offset = 24 and kHeap = 1
Ouch! ReadField got NULL with offset = 24 and kHeap = 1
2015-09-22 19:23:00: Trace
    at safeDeepClone (/usr/lib/node_modules/pm2/node_modules/safe-clone-deep/src/index.js:50:27)
....
Segmentation fault (core dumped)

For me it looks like some (internal?) fields of an object were already cleaned up when process attempted to exit, but the object itself was still there. Maybe this will help somehow.

It's very strange behavior, but I think node should not crash in this case, maybe it should throw an exception or just do nothing?
For now, I'm going to examine pm2's code, I will report here if I find any suspicious code.

indutny · 2015-09-22T16:21:29Z

@heilage-nsk I have a possible fix:

diff --git a/lib/internal/child_process.js b/lib/internal/child_process.js
index 328a67d..890f80a 100644
--- a/lib/internal/child_process.js
+++ b/lib/internal/child_process.js
@@ -446,9 +446,8 @@ function setupChannel(target, channel) {

     } else {
       this.buffering = false;
-      target.disconnect();
       channel.onread = nop;
-      channel.close();
+      target._disconnect();
       maybeClose(target);
     }
   };

May I ask you to give it a try? Thanks!

indutny · 2015-09-22T16:24:06Z

Updated the patch.

indutny · 2015-09-22T16:25:54Z

Wait, there are some test failures with this patch. Please do not apply just yet.

indutny · 2015-09-22T16:56:47Z

Ok, this patch is better:

diff --git a/lib/internal/child_process.js b/lib/internal/child_process.js
index 328a67d..bea139b 100644
--- a/lib/internal/child_process.js
+++ b/lib/internal/child_process.js
@@ -449,6 +449,7 @@ function setupChannel(target, channel) {
       target.disconnect();
       channel.onread = nop;
       channel.close();
+      target._channel = null;
       maybeClose(target);
     }
   };

Please give it a try ;)

ctizen · 2015-09-23T05:58:39Z

@indutny
Thank you!
It doesn't fail for 10 minutes already! (_)/
I'll leave it working for some hours or days to be sure, but it seems to be fixed. Without this patch it crashed after 1-2 minutes.

indutny · 2015-09-23T06:14:32Z

This is a good result, I guess? ;)

ctizen · 2015-09-23T07:42:43Z

Absoultely :)
Still works fine.
If it will go fine until tomorrow, I think I will close this issue, and will wait for this patch in the upstream :)

indutny · 2015-09-23T07:48:40Z

Thanks!

indutny · 2015-09-24T01:11:38Z

@heilage-nsk heya! Any news? ;)

ctizen · 2015-09-24T05:10:05Z

@indutny
Seems OK! Thank you!
Waiting for this patch in upstream :)

The test is still failing sometimes because when trying to establish the second connection, the server is already closed. Bring back the code that handled this case and was removed in the last refactoring of the test. Also ignore the errors that might happen when sending the second handle to the worker because it may already have exited. PR-URL: #5422 Reviewed-By: Rich Trott <[email protected]>

The test would sometimes time out on some platforms. Take the initial version of the test and instead of sending 100 handles, just send 2. It still fails when run with the code before the change was introduced and passes afterwards. Remove test from parallel.status. PR-URL: #5121 Reviewed-By: Brian White <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Fixes: #3635

Windows is still sometimes failing with ECONNRESET. Bring back the handling of this error as was initially introduced in PR #4442. PR-URL: #5179 Reviewed-By: Rich Trott <[email protected]> Fixes: #3635

The test is still failing sometimes because when trying to establish the second connection, the server is already closed. Bring back the code that handled this case and was removed in the last refactoring of the test. Also ignore the errors that might happen when sending the second handle to the worker because it may already have exited. PR-URL: #5422 Reviewed-By: Rich Trott <[email protected]>

Windows would die with ECONNRESET most times when running this particular test. This commit makes handling these errors more tolerable. PR-URL: nodejs#4442 Reviewed-By: Fedor Indutny <[email protected]> Reviewed-By: Rich Trott <[email protected]>

It's not guaranteed that the first socket that tries to connect is the first that succeeds so the rest of assumptions made in the test are not correct. Fix it by making sure the second socket does not try to connect until the first has succeeded. The IPC channel can already be closed when sending the second socket. It should be allowed. Also, don't start sending messages until the worker is online. Fixes: nodejs#8950 PR-URL: nodejs#8954 Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Brian White <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

It's not guaranteed that the first socket that tries to connect is the first that succeeds so the rest of assumptions made in the test are not correct. Fix it by making sure the second socket does not try to connect until the first has succeeded. The IPC channel can already be closed when sending the second socket. It should be allowed. Also, don't start sending messages until the worker is online. Fixes: #8950 PR-URL: #8954 Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Brian White <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

Throw an error if either `err.message` or `err.code` is wrong. Previously, it was only throwing an error if one or the other was happening.

It's not guaranteed that the first socket that tries to connect is the first that succeeds so the rest of assumptions made in the test are not correct. Fix it by making sure the second socket does not try to connect until the first has succeeded. The IPC channel can already be closed when sending the second socket. It should be allowed. Also, don't start sending messages until the worker is online. Fixes: #8950 PR-URL: #8954 Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Brian White <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

Rename test-dgram-regress-4496 to test-dgram-typeerror-buffer to test-dgram-send-invalid-msg-type Rename test-http-regr-nodejsgh-2821 to test-http-request-large-payload Rename test-child-process-fork-regr-nodejsgh-2847 to test-child-process-fork-closed-channel-segfault Rename test-http-pipeline-regr-2639 to test-http-pipeline-serverresponse-assertionerror Rename test-http-pipeline-regr-3332 to test-http-pipeline-requests-connection-leak Rename test-http-pipeline-regr-3508 to test-http-pipeline-socket-parser-typeerror Refs: nodejs#19105 Refs: https://github.com/nodejs/node/blob/master/doc/guides/writing-tests.md#test-structure

mscdex added the c++ Issues and PRs that require attention from people who are familiar with C++. label Sep 13, 2015

ctizen closed this as completed Sep 24, 2015

Fishrock123 mentioned this issue Mar 1, 2016

Release proposal: v5.7.1 #5464

Merged

5 tasks

Trott added a commit to Trott/io.js that referenced this issue Nov 18, 2016

test: make nodejsgh-2847 regression test stricter

35f4828

Throw an error if either `err.message` or `err.code` is wrong. Previously, it was only throwing an error if one or the other was happening.

MylesBorins mentioned this issue Nov 22, 2016

v4.7.0 proposal #9736

Merged

refack mentioned this issue Apr 27, 2017

test: ignore spurious 'EMFILE' #12698

Merged

3 tasks

refack mentioned this issue May 11, 2017

Intermittent parallel/test-http-regr-gh-2821 failure on V6.9.2 (Windows) #10286

Closed

ryzokuken mentioned this issue Mar 26, 2018

test: rename regression tests with descriptive file names (pt. 6) #19608

Closed

6 tasks

gireeshpunathil mentioned this issue May 28, 2018

test: fix worker send error #20973

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault in node::StreamBase::GetFD #2847

Segfault in node::StreamBase::GetFD #2847

ctizen commented Sep 13, 2015

mscdex commented Sep 13, 2015

indutny commented Sep 13, 2015

indutny commented Sep 13, 2015

ctizen commented Sep 14, 2015

ctizen commented Sep 14, 2015

indutny commented Sep 14, 2015

ctizen commented Sep 14, 2015

indutny commented Sep 14, 2015

ctizen commented Sep 16, 2015

indutny commented Sep 17, 2015

ctizen commented Sep 17, 2015

indutny commented Sep 17, 2015

ctizen commented Sep 17, 2015

indutny commented Sep 21, 2015

indutny commented Sep 21, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

ctizen commented Sep 22, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

ctizen commented Sep 23, 2015

indutny commented Sep 23, 2015

ctizen commented Sep 23, 2015

indutny commented Sep 23, 2015

indutny commented Sep 24, 2015

ctizen commented Sep 24, 2015

Segfault in node::StreamBase::GetFD #2847

Segfault in node::StreamBase::GetFD #2847

Comments

ctizen commented Sep 13, 2015

mscdex commented Sep 13, 2015

indutny commented Sep 13, 2015

indutny commented Sep 13, 2015

ctizen commented Sep 14, 2015

ctizen commented Sep 14, 2015

indutny commented Sep 14, 2015

ctizen commented Sep 14, 2015

indutny commented Sep 14, 2015

ctizen commented Sep 16, 2015

indutny commented Sep 17, 2015

ctizen commented Sep 17, 2015

indutny commented Sep 17, 2015

ctizen commented Sep 17, 2015

indutny commented Sep 21, 2015

indutny commented Sep 21, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

ctizen commented Sep 22, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

indutny commented Sep 22, 2015

ctizen commented Sep 23, 2015

indutny commented Sep 23, 2015

ctizen commented Sep 23, 2015

indutny commented Sep 23, 2015

indutny commented Sep 24, 2015

ctizen commented Sep 24, 2015