-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: introduce filehandle.stream
#38440
fs: introduce filehandle.stream
#38440
Conversation
* `start` {integer} Optional, the position where stream starts from. | ||
**Default:** `0` | ||
* `end` {integer} Optional, the position where stream ends. | ||
If it isn't provided. Stream will terminate at the EOF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is end inclusive or exclusive? Should it match with createReadStream
?
lib/internal/fs/promises.js
Outdated
} | ||
|
||
const { buffer, bytesRead } = await handle.read( | ||
allocatedBuffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to allocate a new buffer every time.
if (bytesRead === 0) { | ||
break; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you need something like:
if (bytesRead !== buffer.length) {
// Slow path. Shrink to fit.
// Copy instead of slice so that we don't retain
// large backing buffer for small reads.
const dst = Buffer.allocUnsafeSlow(bytesRead)
buffer.copy(dst, 0, 0, bytesRead)
buffer = dst
}
Why is pipe()/pipeline() so slow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few non-blocking comments to improve the benchmark setup and cleanup efficiency.
async function cleanUp(i) { | ||
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async function cleanUp(i) { | |
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); | |
function cleanUp(_, i) { | |
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]; |
benchmark/fs/bench-stream.js
Outdated
async function cleanUp(i) { | ||
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async function cleanUp(i) { | |
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); | |
function cleanUp(_, i) { | |
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]; |
@@ -368,6 +368,24 @@ changes: | |||
{fs.Stats} object should be `bigint`. **Default:** `false`. | |||
* Returns: {Promise} Fulfills with an {fs.Stats} for the file. | |||
|
|||
#### `filehandle.stream(dst, [options])` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'd be happier with an API that would work with the existing streams API and pipeline utility. Specifically, something like:
pipeline(filehandle.readable(), dst, () => {});
pipeline(src, filehandle.writable(), () => {});
Where the return value of filehandle.readable()
is essentially an fs.ReadStream
and filehandle.writable()
is an fs.WriteStream
.
If we did want an API like this, the I would more expect filehandle.pipe(dst, [options])
or filehandle.pipeTo(dst, [options])
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, something like:
That kind of removes the point of this though. Will have the same performance as just doing fs.createReadStream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is not the only concern, however... a consistent and meaningful API that is not confusing is as equally important. What I'd rather see is effort into making fs.ReadStream
and fs.WriteStream
faster and for there to be a shared code path here so that overall maintainability, consistency, and performance are all improved.
lib/internal/fs/promises.js
Outdated
if (start && typeof start !== 'number') { | ||
throw new ERR_INVALID_ARG_TYPE('start', 'number', start); | ||
} | ||
if (end && typeof end !== 'number') { | ||
throw new ERR_INVALID_ARG_TYPE('end', 'number', end); | ||
} | ||
if (start < 0) { | ||
throw new ERR_OUT_OF_RANGE('start', 'greater than 0', start); | ||
} | ||
if (end < 0) { | ||
throw new ERR_OUT_OF_RANGE('end', 'greater than 0', end); | ||
} | ||
if (end && start > end) { | ||
throw new ERR_OUT_OF_RANGE('start', `less than end ${end}`, start); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use existing validators.js here
const { | ||
start = 0, | ||
end, | ||
} = options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this, and the checkAborted()
above will fail if options
is null
or undefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasnell The object options
is specified as {}
and validated via validateObject function in case of given invalid, so it cannot be null or undefined. Am I missing something?
lib/internal/fs/promises.js
Outdated
} | ||
const BUFFER_SIZE = 128 * 2 ** 10; | ||
const { | ||
start = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good default value, we should start reading from the current position instead of the start of the file by default.
lib/internal/fs/promises.js
Outdated
@@ -739,6 +747,68 @@ async function readFile(path, options) { | |||
return PromisePrototypeFinally(readFileHandle(fd, options), fd.close); | |||
} | |||
|
|||
async function stream(handle, dest, options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function does the equivalent of .pipe()
, then naming it stream
is going to be confusing.
I would address reviews some time on the evening. 😄 |
FWIW, Current benchmarking result on
The idea behind #38350 is simple. And it did give good benchmarking result (at least in my case). I think it is the value of the PR. It could be implemented in user land easily for users whose application is performance-critical. I'm also fine to close this in favor of a more preferable way which integrates with current Node.js architecture better if this didn't so. |
} | ||
|
||
async function cleanUp(_, i) { | ||
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you return the array instead of waiting for it, you avoid the creation of conf.n
promises – instead the promises will be await
ed on line 48.
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); | |
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]; |
benchmark/fs/bench-stream.js
Outdated
} | ||
|
||
async function cleanUp(_, i) { | ||
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you return the array instead of waiting for it, you avoid the creation of conf.n
unnecessary promises – instead the promises will be await
ed on line 45.
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]); | |
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it makes sense.
lib/internal/fs/promises.js
Outdated
async function stream(handle, dest, options) { | ||
if (!options) { | ||
options = {}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this just...
async function stream(handle, dest, options = {}) {
validateObject(options, 'options');
// ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
lib/internal/fs/promises.js
Outdated
end, | ||
} = options; | ||
|
||
if (start) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (start) { | |
if (start !== undefined) { |
lib/internal/fs/promises.js
Outdated
if (start) { | ||
validateInteger(start, 'start', 0); | ||
} | ||
if (end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (end) { | |
if (end !== undefined) { |
Co-authored-by: Antoine du Hamel <[email protected]>
Co-authored-by: James M Snell <[email protected]>
lib/internal/fs/promises.js
Outdated
buffer = copy; | ||
} | ||
|
||
dest.write(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dest.write(buffer); | |
if (!dest.write(buffer)) { | |
if (dest.destroyed) return; | |
await EE.once(dest, 'drain'); | |
} |
Better compat with old streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
lib/internal/fs/promises.js
Outdated
if (dest.writableNeedDrain) { | ||
await EE.once(dest, 'drain'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (dest.writableNeedDrain) { | |
await EE.once(dest, 'drain'); | |
} | |
if (dest.writableNeedDrain) { | |
if (dest.destroyed) return; | |
await EE.once(dest, 'drain'); | |
} |
Co-authored-by: Robert Nagy <[email protected]>
This PR introduces a new API
stream
onfilehandle
, which has better performance overstream.pipe
on file operation.Benchmark
I conducted two benchmarking experiments about copying files by two different way,
stream.pipe
andfilehandle.stream
. For benchmarking code seebenchmark/fs/bench-stream-via-pipeline.js
andbenchmark/fs/bench-stream.js
. The result is showed as following.stream.pipe
(Q1)filehandle.stream
(Q2)Raw Result
It shows good performance over small-size file (~1KB), and comparable performance over large-size file (~10MB).
cc @ronag
Fixes: #38350