Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdout/stderr buffering considerations #6379

Closed
ChALkeR opened this issue Apr 25, 2016 · 62 comments
Closed

stdout/stderr buffering considerations #6379

ChALkeR opened this issue Apr 25, 2016 · 62 comments
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding. net Issues and PRs related to the net subsystem. process Issues and PRs related to the process subsystem. stdio Issues and PRs related to stdio.

Comments

@ChALkeR
Copy link
Member

ChALkeR commented Apr 25, 2016

I tried to discuss this some time ago at IRC, but postponed it for quite a long time. Also I started the discussion of this in #1741, but I would like to extract the more specific discussion to a separate issue.

I could miss some details, but will try to give a quick overview here.

Several issues here:

  1. Many calls to console.log (e.g. calling it in a loop) could chew up all the memory and die — Why does node/io.js hang when printing to the console inside a loop? #1741, Silly program or memory leak? #2970Strange memory leaks #3171, memory leaks of console.log #18013.
  2. console.log has different behavior while printing to a terminal and being redirected to a file. — Why does node/io.js hang when printing to the console inside a loop? #1741 (comment).
  3. Output is sometimes truncated — A libuv upgrade in Node v6 breaks the popular david package #6297, there were other ones as far as I remember.
  4. The behaviour seems to differ across platforms.

As I understand it — the output has an implicit write buffer (as it's non-blocking) of unlimited size.

One approach to fixing this would be to:

  1. Introduce an explicit cyclic write buffer.
  2. Make writes to that cyclic buffer blocking.
  3. Make writes from the buffer to the actual output non blocking.
  4. When the cyclic buffer reaches it's maximum size (e.g. 10 MiB) — block further writes to the buffer until a corresponding part of it is freed.
  5. On (normal) exit, make sure the buffer is flushed.

For almost all cases, except for the ones that are currently broken, this would behave as a non-blocking buffer (because writes to the buffer are considerably faster than writes from the buffer to file/terminal).

For cases when the data is being piped to the output too quickly and when the output file/terminal does not manage to output it at the same rate — the write would turn into a blocking operation. It would also be blocking at the exit until all the data is written.

Another approach would be to monitor (and limit) the size of data that is contained in the implicit buffer coming from the async queue, and make the operations block when that limit is reached.

@ChALkeR ChALkeR added the discuss Issues opened for discussions and feedbacks. label Apr 25, 2016
@ChALkeR
Copy link
Member Author

ChALkeR commented Apr 25, 2016

process.stdout._handle.setBlocking(true); as proposed in #1741 (comment) and #6297 (comment) would considerably hurt the performance. Exposing that isn't a good solution to any of this.

@ChALkeR ChALkeR changed the title stdio buffering considerations stdout/stderr buffering considerations Apr 25, 2016
@Fishrock123 Fishrock123 added net Issues and PRs related to the net subsystem. process Issues and PRs related to the process subsystem. libuv Issues and PRs related to the libuv dependency or the uv binding. labels Apr 25, 2016
@kzc
Copy link

kzc commented Apr 25, 2016

process.stdout._handle.setBlocking(true);
would considerably hurt the performance

Developers are a smart bunch. With the proposed setBlocking API they can decide for themselves if they want stdout/stderr to block on writes. The default could still remain non-blocking. There's no harm in giving people the option. Having CLI applications block on writes for stdout/stderr is quite desirable. This is not unlike all the Sync calls in the fs module. In a high-performance server scenario you don't have to use them, but they are there if you need them.

@ChALkeR
Copy link
Member Author

ChALkeR commented Apr 26, 2016

There's no harm in giving people the option.

There is, every option comes at a cost from both sides. That option would need to be supported, and it would actually just toggle the behaviour between two states that are far from being ideal in many cases.

As for the option here, the said cyclic buffer maximum size could perhaps be configured using a command-line argument, at the buffer size 0 turning in into a fully blocking API.

@kzc
Copy link

kzc commented Apr 26, 2016

toggle the behaviour between two states that are far from being ideal in many cases

The proposed API process.stdout._handle.setBlocking(Boolean) could be documented to only be guaranteed to work before any output is sent to the stream.

On UNIX it is a trivial system call on a fd. It's pretty much the same thing on Windows as far as I recall.

Edit: See #6297 (comment)

@mscdex
Copy link
Contributor

mscdex commented Apr 26, 2016

The proposed API process.stdout._handle.setBlocking(Boolean) could be documented to only be guaranteed to work before any output is sent to the stream.

We should not be documenting underscore-prefixed properties. This would need to be a separate, true public API.

@sindresorhus
Copy link

sindresorhus commented Apr 27, 2016

In CLI tools, I don't want to have stdout/stderr be blocking though, rather I want to be able to explicitly exit the process and have stdout/stderr flushed before exiting. Has there been any discussion about introducing a way to explicitly exit the process gracefully? Meaning flushing the stdout/stderr and then exit. Maybe something like a process.gracefulExit() method? In ES2015 modules, we can no longer return in the top-scope, so short-circuiting will then be effectively impossible, without a nesting mess.

@kzc
Copy link

kzc commented Apr 27, 2016

The exit module is a drop in replacement for process.exit() that effectively waits for stdout/stderr to be drained. Other streams can optionally be drained as well. Something like it really ought to rolled into node core with a different name.

Edit: upon further analysis, the exit module is no longer recommended.

@Qix-
Copy link

Qix- commented Apr 27, 2016

I think there's a slight discrepancy in interpretation here.

Are we talking about console blocking, or console buffering? Albeit similar, these are two different considerations.

Further, I'm not entirely sure what the problem is you're trying to fix, @ChALkeR. Could you elaborate on a problem that is currently being faced? I'm not groking what it is that's broken here.


One thing to note is that on most systems (usually UNIX systems) stdout is buffered to begin with, whereas stderr is usually not (or sometimes just line buffered).

However, an explicit fflush(stdout) or fflush(stderr) is not necessary upon a process closing because these buffers are handled by the system and are flushed when main() returns. If the buffer grows too large during execution, the system automatically flushes it.

This is why the following program still outputs while the loop is running, and on most terminal emulators the same reason why it looks like it's 'jittery'.

#include <stdio.h>

int main(void) {
    for (int i = 10000000; i > 0; i--) {
        printf("This is a relatively not-short line of text. :) %d\n", i);
    }
    return 0;
}

Blocking, on the other hand, doesn't make much sense. What do you want to block? Are you suggesting we fflush(stdout) every time we console.log()? Because if so, that is a terrible idea.

Developers are a smart bunch. With the proposed setBlocking API they can decide for themselves if they want stdout/stderr to block on writes.

I respectfully disagree. There is absolutely no point in blocking on stdout/stderr. It's very rarely done in the native world and when it is it's usually for concurrency problems (two FILE* handles being written to by two threads that cannot otherwise be locked; i.e. TTY descriptors). Let the system handle it.

The above program without fflush(stdout) on every iteration produced the following average:

real 6.32s
sys  0.54
405 involuntary context switches 

and with fflush(stdout) on each iteration:

real 7.48s
sys  0.56s
1607  involuntary context switches

Thats 18.3% of a time increase on this microbenchmark alone. I'm in the 'time is relative and isn't a great indicator of actual performance' camp, so I included the involuntary context switch count (voluntary context switches would be the same between the two runs, obviously).

The results show that fflush(stdout) upon every message encouraged the system to switch out of the process 1202 more times - that's 296% higher than ye olde system buffering.

Further, the feature that was requested by @sindresorhus is part of how executables are formed - at least on UNIX systems, all file descriptors that are buffered are flushed prior to/during close()-ing or fclose()-ing, flushing prior to a process exiting is a needless step and exposing a new function for it would be completely unnecessary.


I'm still lost as to what is trying to be fixed here... I wouldn't mind seeing a process.stdout.flush() method, but it'd be abused in upstream dependencies and I'd find myself writing some pretty annoyed issues in that case. I've never run into a problem with Node's use of standard I/O...

Also, just to reiterate - please don't even start with a .setBlocking() method. That's asking for trouble. The second a dependency makes standard I/O blocking then everything else is affected and then I have to answer to my technical lead about why my simple little program is switching out of context a bazillion times when it didn't before.

@ChALkeR
Copy link
Member Author

ChALkeR commented Apr 27, 2016

@Qix-

Further, I'm not entirely sure what the problem is you're trying to fix, @ChALkeR. Could you elaborate on a problem that is currently being faced? I'm not groking what it is that's broken here.

See the issues I linked above.

Blocking, on the other hand, doesn't make much sense. What do you want to block? Are you suggesting we fflush(stdout) every time we console.log()? Because if so, that is a terrible idea.

No, of course I don't. console.log is not the same as printf in your example.
It's an async operation. I don't propose fflush, I propose to somehow make sure that we don't fill up the async queue with console.logs.

@Qix-
Copy link

Qix- commented Apr 27, 2016

@ChALkeR are we actually seeing this happen?

var str = new Array(8000).join('.');

while (true) {
    console.log(str);
}

works just fine. In what realistic scenario are we seeing the async queue fill up? The three linked issues are toy programs that are meant to break Node, and would break in most languages, in just about all situations, on all systems (resources are finite, obviously).

The way I personally see it, there are tradeoffs to having coroutine-like execution paradigms and having slightly higher memory usage is a perfectly acceptable tradeoff for the asynchronicity Node provides. It doesn't handle extreme cases because those cases are well outside any normal or even above normal scenario.

The first case, which mentions a break in execution to flush the async buffer, is perfectly reasonable and expected.


Side note: thanks for clarifying :) that makes more sense. I was hoping it wasn't seriously proposed we fflush() all the time. Just had to make sure to kill that idea before anyone ran with it.

@ChALkeR
Copy link
Member Author

ChALkeR commented Apr 27, 2016

@Qix-

  1. What's your platform?
  2. Are you sure that you are not redirecting the output anywhere?

@Qix-
Copy link

Qix- commented Apr 27, 2016

@ChALkeR

  1. OS X and linux (tested on both)
  2. Positive.

@ChALkeR
Copy link
Member Author

ChALkeR commented Apr 27, 2016

@Qix- Are you saying that memory usage eventually stabilizes when you run this? How much memory does it consume for you?

@Qix-
Copy link

Qix- commented Apr 27, 2016

@ChALkeR ah no, memory usage goes off the charts. Again, though, is this happening in practical environments outside of stress tests?

@ChALkeR
Copy link
Member Author

ChALkeR commented Apr 27, 2016

@Qix- Yes, it does. Users have reported it several times already, see links above.

@Qix-
Copy link

Qix- commented Apr 27, 2016

@ChALkeR The three links you mentioned (#1741, #2970, #3171) are not practical environments outside of stress tests.

Truncated output (#6297 et al) is unrelated to the ones mentioned above.

@ChALkeR
Copy link
Member Author

ChALkeR commented Apr 27, 2016

@Qix- #3171, for example, doesn't look like just a stress test to me.

@Qix-
Copy link

Qix- commented Apr 27, 2016

@ChALkeR

for (var i = 0; i < 1000000000; i++) {
    value = nextPrime(value);
    console.log(value);
}

This is definitely a stress test. This isn't a practical or realistic use of Node.js - an async scripting wrapper for IO-heavy applications.


For instance,

function loggit() {
    console.log('hello');
    setImmediate(loggit);
}

loggit()

tops out at 15M memory consumption. I've tweaked #3171's code to reflect the changes here, which now hovers around 45M (instead of inflating to over a gig in about a minute).

Of course a stupidly huge while/for loop with a ton of non-blocking IO (not to mention, a function that is solving an NP hard problem) is going to cause problems. Don't do it. It's a simple misunderstanding of how Node works, and if it were up to me I'd brush it off as an error on the developer's part.

The second we start baby-sitting bad design like all three of the examples you mentioned and start ad-hoc adding methods like .setBlocking(true) is when we start stepping on the toes of dependents down-stream, which begins to break everything.

@kzc
Copy link

kzc commented Apr 27, 2016

Are we talking about console blocking, or console buffering? Albeit similar, these are two different considerations.

Either blocking tty writes or graceful drain of stdout/stderr upon process.exit() would fix the problem at hand with truncated tty output.

I think both APIs should be part of node core. Devs should have the option to have stdout/stderr block if they so should choose or have stdout/stdout gracefully flushed if they find that more appropriate.

Logging to the console 10000000 times in a tight loop is not real life behavior. In reality performance oriented servers rarely log to the console - and when they do it's important that the information is output. But if you did want to log to the console 1e7 times (and run out of memory due to another known node issue) then you would not enable blocking on stdout - so it's a red herring.

@Qix-
Copy link

Qix- commented Apr 27, 2016

stdout/stderr should flush on program exit anyway. It's the fact node uses the exit() call that messes quite a few things up. That's actually an easy enough fix with Node specifically.

My problem with .setBlocking() is that it affects the process globally, not just specific pieces of code or files. That means a dependency could mess up how my code functions without really understanding what .setBlocking() does.

@kzc
Copy link

kzc commented Apr 27, 2016

Comment just posted in other thread: #6297 (comment)

process.stdout.flush() and process.stderr.flush()

It's not like C where the data is buffered in libc and can be flushed. The unflushed data is in some Javascript data structure in V8 and the event loop has been halted.

@Qix-
Copy link

Qix- commented Apr 27, 2016

@kzc yes, that is also a problem. exit() in and of itself is the cause of your specific problem, as has been mentioned quite a few times.

@eljefedelrodeodeljefe
Copy link
Contributor

@ChALkeR I dragged over the contents from the issue to #6456 and would be closing this, since I think it's highly related. Please feel free to reopen if you see the need. Thx.

@kurahaupo
Copy link

The idea that this is "expected" for non blocking streams is fatally flawed: nobody expects logging or stdio to be non blocking, and if you've changed the file descriptor flags, that's an implementation detail that should be hidden from the outside world.

Moreover buffering constitutes an implicit promise that it will eventually be written, so at least at exit all buffers should be fully written out.

Likewise OOM should trigger memory recovery, including writing out buffers.

Delays are less important than broken promises.

@silverwind
Copy link
Contributor

Just posting workaround for this issue for future reference:

for (const stream of [process.stdout, process.stderr]) {
  stream?._handle?.setBlocking?.(true);
}

Before:

$ node -p 'process.stdout.write("x".repeat(5e6)); process.exit()' | wc -c
65536

After:

$ node -p 'for (const stream of [process.stdout, process.stderr]) stream?._handle?.setBlocking?.(true); process.stdout.write("x".repeat(5e6)); process.exit()' | wc -c
5000000

@kurahaupo
Copy link

It seems that the phrase "non blocking" has been used in this discussion to mean something different from the POSIX definition, and that may account for some of the confusion.

In POSIX terms, buffering is what happens within a process, while non blocking and asynchronous are two different ways of interacting with the kernel.

A non blocking (O_NONBLOCK or O_NDELAY) kernel call will fail rather than block; this is certainly not what is envisioned by most participants in this discussion. (Moreover, it will only do so for streams that in principle can block for an arbitrary period, such as terminals, pipes, and network sockets; it does not affect writing to most ordinary files and block devices. )

In contrast, an asynchronous kernel call will neither succeed nor fail, but rather it returns to the running process immediately and subsequently uses a signal to notify the process of the success or failure of the requested operation. That's also not what is being discussed here, but could be useful as an implementation detail for buffering in userspace.

What is wanted here is the equivalent of the C language's setvbuf function: the ability to change the buffer size from "infinite" down to something smaller.

I would strongly urge avoiding using the term "blocking" when defining a new public method to control this, as it unnecessarily muddles the meaning of related terms cf O_NONBLOCK.

@gireeshpunathil
Copy link
Member

@kurahaupo - Irrespective of whether the usage of the term blocking is consistent with POSIX definition or not (I don't know if that consistency is relevant here or not), its usage in this project's documentation is consistent: a call that effectively blocks the Node.js event loop's forward progress (by virtue of the current thread yielding to the kernel for actively processing I/O). And I guess our users had not much issues with that definition.

@guest271314
Copy link

If I understand this issue correctly this is the place to post my feedback. I wrote a Node.js Native Messaging host that streams output from child.stdout.on('data' (data) => {}) to the browser with process.stdout.write(data).

I observed that RSS constantly increases during usage. Perhaps someone in the Node.js realm can explain why this occurs and how to fix this memory leak https://thankful-butter-handball.glitch.me/.

@bnoordhuis
Copy link
Member

This issue has been open for nearly 7 years. In the mean time mitigating work has been done and lots of discussion took place in other issues. I'm going to put this one out to pasture.

@midzelis
Copy link

For people that are looking for a workarounds, many are given here: https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe.

The one I choose was https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe/25378#25378, which is to wrap the command you want to execute with the 'script' command, which has a way to unbuffer the stdout.

script -q /dev/null long_running_command | print_progress      # (FreeBSD, Mac OS X)
script -q -c "long_running_command" /dev/null | print_progress # (Linux)

All the other solutions in that linked thread required installing a new program, but the script command is available on most POSIX-ish systems, albeit with different cmd line arg styles. This works for me - hopefully it can work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding. net Issues and PRs related to the net subsystem. process Issues and PRs related to the process subsystem. stdio Issues and PRs related to stdio.
Projects
None yet
Development

No branches or pull requests