Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finished(stream, cb) invokes callback too early #32032

Closed
ronag opened this issue Mar 1, 2020 · 19 comments
Closed

finished(stream, cb) invokes callback too early #32032

ronag opened this issue Mar 1, 2020 · 19 comments
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@ronag
Copy link
Member

ronag commented Mar 1, 2020

The semantics of finished is to invoke the callback on 'end' and 'finish'. The problem with this is that this does not mean that the stream is actually "finished", i.e. it might still emit 'error' and has probably not yet freed all resources since 'close' is not yet emitted.

The reason we invoke it early in this way is because not all stream actually emit 'close'. I'm a little unsure how to resolve this and whether this is worth resolving.

One way could be to use setImmediate or setTimeout before invoking the callback in 'end'/'finish' in order to give a chance for 'error' or 'close' to be emitted. However, that is very much imperfect.

We might want to at least clarify this in the docs.

Thoughts? @nodejs/streams

@ronag ronag added the stream Issues and PRs related to the stream subsystem. label Mar 1, 2020
@ronag ronag changed the title finished invokes callback too early finished(stream, cb) invokes callback too early Mar 1, 2020
@mcollina
Copy link
Member

mcollina commented Mar 1, 2020

I think the current behavior is fine.

What would be the actual difference in waiting for close instead? (minus backward compatibility, which would postpone any changes for a long time :/).

@ronag
Copy link
Member Author

ronag commented Mar 1, 2020

What would be the actual difference in waiting for close instead?

I guess 2 things, an error from resource cleanup might be swallowed also making assumption in regards to whether _destroy has actually finished might fail:

consider:

// Creates internally a file called "foo.tmp"
const s = new MyStream();
s._destroy((err ,cb) => {
  fs.unlink("foo.tmp", cb);
});
finished(s, (err) => {
  // Code that assumes that _destroy has completed, e.g. "foo.tmp" has been removed.
});

I'll give that I'm grabbing a bit at straws but I still think it's worth at least explicitly mentioning in the docs.

@vweevers
Copy link
Contributor

vweevers commented Mar 1, 2020

I've been bitten by this too, but as y'all mentioned, there's only one proper way to solve it, namely having all streams emit close.

+1 for documenting it until then.

@ronag
Copy link
Member Author

ronag commented Mar 1, 2020

There is also the option to assume 'close' will be emitted for "modern" streams and only use this behavior for "legacy" streams.

e.g.

const s = s._writableState || s._readableState;
const isLegacy = !s || s.closed === undefined;

However, that would assume that "modern" streams are properly implemented in terms of autoDestroy and emitClose which I've noticed is not always the case.

@vweevers
Copy link
Contributor

vweevers commented Mar 1, 2020

I think ultimately, autoDestroy should be the default behavior, without being able to opt-out.

@ronag
Copy link
Member Author

ronag commented Mar 1, 2020

I think ultimately, autoDestroy should be the default behavior, without being able to opt-out.

It is default in master. Not being able to opt-out is going to be difficult. We have some code in core that depends on it which I find unlikely to be refactored to such a degree (though other parts are work in progress).

@ronag
Copy link
Member Author

ronag commented Mar 1, 2020

We could also go even more conservative:

const s = s._writableState || s._readableState;
const isLegacy = !s || s.closed === undefined || !s.autoDestroy || !s.emitClose;

@vweevers
Copy link
Contributor

vweevers commented Mar 1, 2020

If it (finishing on close) only works for some streams, it might not be worth it, because you can't rely on it without checking the implementation details of every stream.

@vweevers
Copy link
Contributor

vweevers commented Mar 1, 2020

What if streams had a way to say "yes, I always emit close"? Like s.willEmitClose (bikeshed name).

@ronag
Copy link
Member Author

ronag commented Mar 1, 2020

What if streams had a way to say "yes, I always emit close"? Like s.willEmitClose (bikeshed name).

streams should always emit 'close'. Such a property would only say "yes, I'm a broken stream".

@vweevers
Copy link
Contributor

vweevers commented Mar 1, 2020

Exactly, haha. There are too many streams out there that by today's standards are broken. The only sensible value for s.willEmitClose is true, but it'd be a new property that defaults to false. Module authors must explicitly set it to true. The point is to differentiate a stream from old streams, as well as userland streams that do depend on latest readable-stream but still do things the old way.

(I would strongly prefer a solution that's baked into streams and doesn't require action from module authors, but it seems that's not an option due to aforementioned node internals, so here we are).

@ronag
Copy link
Member Author

ronag commented Mar 1, 2020

I've also noticed that https://github.com/nodejs/node/blob/master/lib/internal/streams/pipeline.js#L44 depends on this behavior where finished/eos is invoked as early as possible.

@vweevers
Copy link
Contributor

vweevers commented Mar 1, 2020

Yeah, stream.pipeline() is where I was bitten. I had a stream that in essence owned a lock and had to be unlocked in its _destroy() method. On top of that, when the pipeline finished, a new one started that sometimes needed the same lock, along the lines of:

function loop() {
  stream.pipeline(getStreamWithLock(), ..., function () {
     loop()
  })
}

@mcollina
Copy link
Member

mcollina commented Mar 1, 2020

I'm definitely +1 if we can soft-detect this on a stream-by-stream basis

@ronag
Copy link
Member Author

ronag commented Mar 2, 2020

What are the preferred semantics here? I think pipeline should preferrably only call the callback once every stream has emitted 'close' with fallback behavior for legacy streams?

@mcollina: Even if we exclude "legacy" streams (we can detect pre v14), we would potentially break "modern" streams that explicitly set autoDestroy: false, emitClose: false or monkey patch destroy in a way that causes 'close' never to be emitted. They are essentially "broken", but this kind of change could make them break harder (which might or might not be a good thing depending on perspective). Do we think that is feasible? I wouldn't mind creating a PR and trying CITGM if we think this would be acceptable if CITGM passes.

@vweevers
Copy link
Contributor

vweevers commented Mar 2, 2020

They are essentially broken, but this kind of change could make them break harder

Broken according to modern stream semantics, but still usable. It would have been different if pipeline() wasn't already in the wild. Folks are using it with every kind of stream. Including streams that have their own destroy, back from when node didn't have that yet, so it might not be fair to classify that as monkey-patching. But those streams are probably compatible, seeing as they inspired the modern destroy.

If we favor backwards compatibility and can't reliably detect whether a particular stream will emit close, or invent a way for the stream to signal that it will emit close, we must assume it doesn't. On the other hand, I want streams to move forward (until we can say "it just works™") so I'm torn.

@vweevers
Copy link
Contributor

vweevers commented Mar 2, 2020

I don't think CITGM would give us enough coverage, because every module must have its own tests for the close event.

@vweevers
Copy link
Contributor

vweevers commented Mar 2, 2020

This reminds me of stream-test (see vweevers/stream-test#2) which never got anywhere because I didn't have the time.

@mcollina
Copy link
Member

mcollina commented Mar 2, 2020

From my point of view, I think we should check if autoDestroy and emitClose options are set, and in that case wait for 'close'. Otherwise keep the current backward compatible behavior.

ronag added a commit to nxtedition/node that referenced this issue Mar 25, 2020
Pipeline uses eos which will invoke the callback
on 'finish' and 'end' before all streams have been
fully destroyed.

Fixes: nodejs#32032
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants