Consistent end of stream handling

* **Version**: v10.6.0
* **Platform**: n/a
* **Subsystem**: stream, documentation



Node.js appears to inconsistently handle how to detect the end of a stream, in order to pass on execution to the next step, especially in the presence of errors.

```javascript
const Transform = require('stream').Transform;
const fs = require('fs');

function testPipe(T) {
	var s = fs.createReadStream('index.js');
	s.pipe(T);
	s.on('error', function() { console.log('source.error'); });
	s.on('finish', function() { console.log('source.finish'); });
	s.on('end', function() { console.log('source.end'); });
	s.on('close', function() { console.log('source.close'); });
	T.on('error', function() { console.log('destination.error'); });
	T.on('finish', function() { console.log('destination.finish'); });
	T.on('end', function() { console.log('destination.end'); });
	T.on('close', function() { console.log('destination.close'); });
}

testPipe(new Transform({
	transform: function(chunk, encoding, callback) {
		console.log('_transform');
		return void callback();
	},
	flush: function(callback) {
		console.log('_flush');
		return void callback();
	}
}));
```

The documentation says:

> The 'end' event is emitted when there is no more data to be consumed from the stream.
>
> The 'end' event will not be emitted unless the data is completely consumed. This can be accomplished by switching the stream into flowing mode, or by calling stream.read() repeatedly until all data has been consumed.


However, no "end" event is produced from the destination stream:

```
_write
source.end
_final
destination.finish
source.close
```

A similar result is seen for a transform stream:

```
_transform
_flush
destination.finish
source.end
source.close
```

The data has clearly been "completely consumed", as indicated by the fact that `_flush` has been called and returned.

The stream emits "finish" which suggests people should use this. However, this only indicates the data has been read, not that it's processed. Further, its behavior is inconsistent:

```javascript
const T = new Transform({
	transform: function(chunk, encoding, callback) {
		console.log('_transform');
		return void callback(new Error('Syntax error'));
	},
	flush: function(callback) {
		console.log('_flush');
		return void callback();
	}
});
```

We get:

```
_transform
destination.error
```

But if there's a problem while processing the end of the document (say, more data was expected), then a different set of events is raised. This:

```javascript
const T = new Transform({
	transform: function(chunk, encoding, callback) {
		console.log('_transform');
		return void callback();
	},
	flush: function(callback) {
		console.log('_flush');
		return void callback(new Error('Unexpected end-of-file'));
	}
});
```

Produces:

```
_transform
_flush
destination.error
destination.finish
source.end
source.close
```

Note how `destination.finish` is emitted after `destination.error`.


### StringWriter example

The Node.js documentation provides an example of a Writable stream that consumes data. However, it doesn't specify how to actually use the example in a context with events and actual streams.

When piping a file (using `fs.createReadStream`) into the example, again no "end" event is emitted:

```
source.end
destination.finish
source.close
```

The documentation omits any description of how to determine when all processing is complete (either when the stream is consumed and fully processed, or once there is a fatal error indicating no more processing will occur).

I've asked several developers to describe how to consume a stream and call a function, and nobody has been able. This suggests to me the behavior of Streams is too complex, and/or under-documented.

I would like to suggest a specific course of action, especially specific edits to the documentation, however it's not clear to me what the intended behavior of streams is even supposed to be.

### Summary

I've documented a case where Node.js fails to guarantee that I can have a function called exactly once, or that it's difficult to write applications following this behavior without significant risk of introducing bugs.

There's several courses of action:

* Describe in documentation how to determine when streams such as StringWriter have completed processing a readable stream
* Transform streams should emit 'end' to indicate processing is complete
* Always emit `'end'` after an error (if `'finish'` is getting called after `'error'` then surely it should be safe to also call `'end'`, right?)
* Make streams "then"able, which implies specific guarantees about continuing execution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Consistent end of stream handling #21705

StringWriter example

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Consistent end of stream handling #21705

Description

StringWriter example

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions