-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
Description
- Version: v10.6.0
- Platform: n/a
- Subsystem: stream, documentation
Node.js appears to inconsistently handle how to detect the end of a stream, in order to pass on execution to the next step, especially in the presence of errors.
const Transform = require('stream').Transform;
const fs = require('fs');
function testPipe(T) {
var s = fs.createReadStream('index.js');
s.pipe(T);
s.on('error', function() { console.log('source.error'); });
s.on('finish', function() { console.log('source.finish'); });
s.on('end', function() { console.log('source.end'); });
s.on('close', function() { console.log('source.close'); });
T.on('error', function() { console.log('destination.error'); });
T.on('finish', function() { console.log('destination.finish'); });
T.on('end', function() { console.log('destination.end'); });
T.on('close', function() { console.log('destination.close'); });
}
testPipe(new Transform({
transform: function(chunk, encoding, callback) {
console.log('_transform');
return void callback();
},
flush: function(callback) {
console.log('_flush');
return void callback();
}
}));The documentation says:
The 'end' event is emitted when there is no more data to be consumed from the stream.
The 'end' event will not be emitted unless the data is completely consumed. This can be accomplished by switching the stream into flowing mode, or by calling stream.read() repeatedly until all data has been consumed.
However, no "end" event is produced from the destination stream:
_write
source.end
_final
destination.finish
source.close
A similar result is seen for a transform stream:
_transform
_flush
destination.finish
source.end
source.close
The data has clearly been "completely consumed", as indicated by the fact that _flush has been called and returned.
The stream emits "finish" which suggests people should use this. However, this only indicates the data has been read, not that it's processed. Further, its behavior is inconsistent:
const T = new Transform({
transform: function(chunk, encoding, callback) {
console.log('_transform');
return void callback(new Error('Syntax error'));
},
flush: function(callback) {
console.log('_flush');
return void callback();
}
});We get:
_transform
destination.error
But if there's a problem while processing the end of the document (say, more data was expected), then a different set of events is raised. This:
const T = new Transform({
transform: function(chunk, encoding, callback) {
console.log('_transform');
return void callback();
},
flush: function(callback) {
console.log('_flush');
return void callback(new Error('Unexpected end-of-file'));
}
});Produces:
_transform
_flush
destination.error
destination.finish
source.end
source.close
Note how destination.finish is emitted after destination.error.
StringWriter example
The Node.js documentation provides an example of a Writable stream that consumes data. However, it doesn't specify how to actually use the example in a context with events and actual streams.
When piping a file (using fs.createReadStream) into the example, again no "end" event is emitted:
source.end
destination.finish
source.close
The documentation omits any description of how to determine when all processing is complete (either when the stream is consumed and fully processed, or once there is a fatal error indicating no more processing will occur).
I've asked several developers to describe how to consume a stream and call a function, and nobody has been able. This suggests to me the behavior of Streams is too complex, and/or under-documented.
I would like to suggest a specific course of action, especially specific edits to the documentation, however it's not clear to me what the intended behavior of streams is even supposed to be.
Summary
I've documented a case where Node.js fails to guarantee that I can have a function called exactly once, or that it's difficult to write applications following this behavior without significant risk of introducing bugs.
There's several courses of action:
- Describe in documentation how to determine when streams such as StringWriter have completed processing a readable stream
- Transform streams should emit 'end' to indicate processing is complete
- Always emit
'end'after an error (if'finish'is getting called after'error'then surely it should be safe to also call'end', right?) - Make streams "then"able, which implies specific guarantees about continuing execution