-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash deserializing IPC message using advanced serialization #34797
Comments
Workaround for nodejs/node#34797
@nodejs/workers |
* Use advanced serialization (when available) for test worker communication Use advanced IPC on 12.17.0. This is currently our future minimal Node.js 12 version. * Buffer IPC sends to avoid crashes Workaround for nodejs/node#34797 * Tweak reporter integration tests Maybe it's the IPC changes, but they started to fail on Windows.
Fwiw, I can’t seem to reproduce this locally. (And it’s more of a case for @nodejs/child_process, if we go by teams.) |
Yeah, sorry. I initially thought it was maybe a general issue with our implementation of the serialization API and AFAIK there's no team for the Now that I'm looking again at the issue, I wonder if it may be related to the use of different incompatible Node.js/V8 versions between parent and child processes. |
I think that would lead to more consistent failures? My best guess would be that we somehow mess up the message boundaries and start parsing in the middle of another message, but I’m not really sure how that would happen… |
I'm using |
When I was digging into the JS code that drives this there's a bunch of array buffers, so maybe the copying of data off the channel is corrupting a shared buffer? (I'm just thinking out loud, not really sure what the code is doing.) |
@novemberborn There aren’t any SharedArrayBuffer instances involved, if that’s what you’re referring to. The format for messages is relatively simple: It’s 4 bytes that contain the length of the message, plus the rest of the message. We use that first field to determine the message boundaries, so if that calculation goes wrong at some point, that might be the cause of a crash like this (the code does look okay to me, though). Another possibility would be that data is indeed corrupted during transfer, but I’m not sure how that would happen either. If you’re up for building Node.js to debug this, you could probably print the contents of the |
I’m on x64 Linux, so it’s definitely possible that this is platform-specific, yes. |
I can reproduce described issue on macOS 10.15.6 |
FWIW it seems to be fixed on master. |
@addaleax @lpinca I also thought it was fixed on master but then it randomly crashed. Edit: got it to crash on current master after like 3000 runs =) |
Hmm no crash on master with 10000 runs. I guess this will be hard to debug. |
Increasing the |
This makes v16.1.0 reliably crash on my linux machines (x64 and rpi/armv7). It works on windows though. const assert = require('assert'),
cluster = require('cluster');
const SIZE = 7 * 1024 * 1024;
if (cluster.isPrimary) {
cluster.setupPrimary({ serialization: 'advanced' });
const worker = cluster.fork();
worker.send(new Uint8Array(SIZE).fill(0x00));
worker.send(new Uint8Array(SIZE).fill(0x01));
} else {
process.on('message', data => {
console.log(Buffer.from(data));
assert(data.length === SIZE, 'invalid size');
for (let i = 1; i < SIZE; i++) assert(data[0] === data[i], 'invalid byte at offset '+i);
});
} Either the worker receives the first message, but it's corrupt. The beginning and size is correct but after a couple of bytes the second message starts (including the size-header and the serialization-header). AssertionError [ERR_ASSERTION]: invalid byte at offset 219252
at process.<anonymous> (/home/dobe/Work/nodejs-adv-ser-fix/easy-crash.js:21:36)
at process.emit (node:events:377:35)
at emit (node:internal/child_process:928:12)
at processTicksAndRejections (node:internal/process/task_queues:84:21) The invalid byte offset on the raspberry pi is Or no message gets received at all (but no infinite loop). I tried digging into it, and when I check the arguments for node/lib/internal/child_process.js Lines 834 to 838 in 4af15df
req.oncomplete = (...args) => {
console.trace('req.oncomplete', args);
control.unrefCounted();
if (typeof callback === 'function')
callback(null);
}; node delivers an error code Trace: req.oncomplete [
-14,
Pipe {
buffering: false,
pendingHandle: null,
sockets: { got: {}, send: {} },
[Symbol(kMessageBuffer)]: <Buffer >
},
undefined
]
at WriteWrap.req.oncomplete (node:internal/child_process:842:19) Which I guess means Some observations I made:
I'm sort of lost with the system call stuff and node, but I think once the error occurs the channel is unusable, so handling it in |
Hello! We ran into the same issue, and I played around a bit with this. To me it seems to be a buffer overflow / flush timing issue. If you change the size of the messages to be smaller, you will run into the same problem if you submit multiple messages in a short time period. For example, running the following script on my machine
Generates the following output (the offset where the error happens changes by a couple of thousand each run)
If I change the sleep timer to
The offset again changes by a couple of thousand, so essentially it feels like it depends on how fast your machine is at actually transferring the message and the size of the message. If you transfer more data than can be transfered to the child process then new messages start over-writing old ones. |
Do the same thing we do for other streams, and retain a reference to the Buffer that was sent over IPC while the write request is active, so that it doesn’t get garbage collected while the data is still in flight. (This is a good example of why it’s a bad idea that we’re not re-using the general streams implementation for IPC and instead maintain separate usage of the low-level I/O primitives.) Fixes: nodejs#34797
@peeter-tomberg Thanks, the reproduction here was really valuable. I’ve opened #38728 which should address this. |
Awsome! Will this be applied for older versions of node as well? We're experiencing this on Node 12 LTS. |
+1 we're using electron 11 / node 12 would be great to have this back-ported! |
@peeter-tomberg @KishanBagaria It should follow the regular release schedule, so, yes, that should not be an issue. |
Do the same thing we do for other streams, and retain a reference to the Buffer that was sent over IPC while the write request is active, so that it doesn’t get garbage collected while the data is still in flight. (This is a good example of why it’s a bad idea that we’re not re-using the general streams implementation for IPC and instead maintain separate usage of the low-level I/O primitives.) Fixes: #34797 PR-URL: #38728 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Minwoo Jung <[email protected]>
Do the same thing we do for other streams, and retain a reference to the Buffer that was sent over IPC while the write request is active, so that it doesn’t get garbage collected while the data is still in flight. (This is a good example of why it’s a bad idea that we’re not re-using the general streams implementation for IPC and instead maintain separate usage of the low-level I/O primitives.) Fixes: #34797 PR-URL: #38728 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Minwoo Jung <[email protected]>
Do the same thing we do for other streams, and retain a reference to the Buffer that was sent over IPC while the write request is active, so that it doesn’t get garbage collected while the data is still in flight. (This is a good example of why it’s a bad idea that we’re not re-using the general streams implementation for IPC and instead maintain separate usage of the low-level I/O primitives.) Fixes: #34797 PR-URL: #38728 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Minwoo Jung <[email protected]>
Do the same thing we do for other streams, and retain a reference to the Buffer that was sent over IPC while the write request is active, so that it doesn’t get garbage collected while the data is still in flight. (This is a good example of why it’s a bad idea that we’re not re-using the general streams implementation for IPC and instead maintain separate usage of the low-level I/O primitives.) Fixes: #34797 PR-URL: #38728 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Minwoo Jung <[email protected]>
Do the same thing we do for other streams, and retain a reference to the Buffer that was sent over IPC while the write request is active, so that it doesn’t get garbage collected while the data is still in flight. (This is a good example of why it’s a bad idea that we’re not re-using the general streams implementation for IPC and instead maintain separate usage of the low-level I/O primitives.) Fixes: nodejs#34797 PR-URL: nodejs#38728 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Minwoo Jung <[email protected]>
Do the same thing we do for other streams, and retain a reference to the Buffer that was sent over IPC while the write request is active, so that it doesn’t get garbage collected while the data is still in flight. (This is a good example of why it’s a bad idea that we’re not re-using the general streams implementation for IPC and instead maintain separate usage of the low-level I/O primitives.) Fixes: #34797 PR-URL: #38728 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Minwoo Jung <[email protected]>
What steps will reproduce the bug?
I start a child process using
fork()
and{ serialization: 'advanced' }
. In the child process I synchronously callprocess.send()
until it returnsfalse
. I keepprocess.channel
referenced until all mysend()
callbacks have been called. Every so often, the main process crashes. Other times it exits gracefully, though it does not log all received messages. Presumably because the child process exits without flushing its IPC buffer. That's manageable and probably not a bug.main.js
:child.js
:And then run
node main.js
.How often does it reproduce? Is there a required condition?
It's intermittent.
What is the expected behavior?
The main process does not crash.
What do you see instead?
The main process crashes with the following error:
Additional information
If I modify
child.process
to schedule sends usingsetImmediate()
there is no crash, and the main process receives all messages:The text was updated successfully, but these errors were encountered: