Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: introduce filehandle.stream #38440

Closed

Conversation

Ayase-252
Copy link
Member

This PR introduces a new API stream on filehandle, which has better performance over stream.pipe on file operation.

Benchmark

I conducted two benchmarking experiments about copying files by two different way, stream.pipe and filehandle.stream. For benchmarking code see benchmark/fs/bench-stream-via-pipeline.js and benchmark/fs/bench-stream.js. The result is showed as following.

File Size Copy bystream.pipe (Q1) Copy by filehandle.stream (Q2) Improvement (Q2/Q1)
1 KB 2,882.30 11,102.54 3.85
1 MB 786.42 1,656.13 2.11
10 MB 68.30 73.16 1.07
Raw Result
➜  node git:(feature/add-stream-to-file-handle) out/Release/node benchmark/fs/bench-stream-via-pipeline.js
fs/bench-stream-via-pipeline.js fileSize=1024 n=100: 2,882.297241630011
fs/bench-stream-via-pipeline.js fileSize=1048576 n=100: 786.4203946112839
fs/bench-stream-via-pipeline.js fileSize=10485760 n=100: 68.301593019569
➜  node git:(feature/add-stream-to-file-handle) out/Release/node benchmark/fs/bench-stream.js
fs/bench-stream.js fileSize=1024 n=100: 11,102.539950269502
fs/bench-stream.js fileSize=1048576 n=100: 1,656.1340954553295
fs/bench-stream.js fileSize=10485760 n=100: 73.16234942556548

It shows good performance over small-size file (~1KB), and comparable performance over large-size file (~10MB).

cc @ronag

Fixes: #38350

@github-actions github-actions bot added fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. labels Apr 27, 2021
* `start` {integer} Optional, the position where stream starts from.
**Default:** `0`
* `end` {integer} Optional, the position where stream ends.
If it isn't provided. Stream will terminate at the EOF.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is end inclusive or exclusive? Should it match with createReadStream?

}

const { buffer, bytesRead } = await handle.read(
allocatedBuffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to allocate a new buffer every time.

if (bytesRead === 0) {
break;
}

Copy link
Member

@ronag ronag Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you need something like:

if (bytesRead !== buffer.length) {
        // Slow path. Shrink to fit.
        // Copy instead of slice so that we don't retain
        // large backing buffer for small reads.
        const dst = Buffer.allocUnsafeSlow(bytesRead)
        buffer.copy(dst, 0, 0, bytesRead)
        buffer = dst
}

@wa-Nadoo
Copy link

Why is pipe()/pipeline() so slow?

Copy link
Contributor

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few non-blocking comments to improve the benchmark setup and cleanup efficiency.

benchmark/fs/bench-stream-via-pipeline.js Outdated Show resolved Hide resolved
benchmark/fs/bench-stream-via-pipeline.js Outdated Show resolved Hide resolved
benchmark/fs/bench-stream-via-pipeline.js Outdated Show resolved Hide resolved
benchmark/fs/bench-stream.js Outdated Show resolved Hide resolved
benchmark/fs/bench-stream.js Outdated Show resolved Hide resolved
benchmark/fs/bench-stream-via-pipeline.js Outdated Show resolved Hide resolved
Comment on lines 64 to 65
async function cleanUp(i) {
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async function cleanUp(i) {
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
function cleanUp(_, i) {
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)];

Comment on lines 56 to 57
async function cleanUp(i) {
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async function cleanUp(i) {
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
function cleanUp(_, i) {
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)];

lib/internal/fs/promises.js Outdated Show resolved Hide resolved
@@ -368,6 +368,24 @@ changes:
{fs.Stats} object should be `bigint`. **Default:** `false`.
* Returns: {Promise} Fulfills with an {fs.Stats} for the file.

#### `filehandle.stream(dst, [options])`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'd be happier with an API that would work with the existing streams API and pipeline utility. Specifically, something like:

pipeline(filehandle.readable(), dst, () => {});

pipeline(src, filehandle.writable(), () => {});

Where the return value of filehandle.readable() is essentially an fs.ReadStream and filehandle.writable() is an fs.WriteStream.

If we did want an API like this, the I would more expect filehandle.pipe(dst, [options]) or filehandle.pipeTo(dst, [options]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, something like:

That kind of removes the point of this though. Will have the same performance as just doing fs.createReadStream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is not the only concern, however... a consistent and meaningful API that is not confusing is as equally important. What I'd rather see is effort into making fs.ReadStream and fs.WriteStream faster and for there to be a shared code path here so that overall maintainability, consistency, and performance are all improved.

Comment on lines 761 to 775
if (start && typeof start !== 'number') {
throw new ERR_INVALID_ARG_TYPE('start', 'number', start);
}
if (end && typeof end !== 'number') {
throw new ERR_INVALID_ARG_TYPE('end', 'number', end);
}
if (start < 0) {
throw new ERR_OUT_OF_RANGE('start', 'greater than 0', start);
}
if (end < 0) {
throw new ERR_OUT_OF_RANGE('end', 'greater than 0', end);
}
if (end && start > end) {
throw new ERR_OUT_OF_RANGE('start', `less than end ${end}`, start);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use existing validators.js here

const {
start = 0,
end,
} = options;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this, and the checkAborted() above will fail if options is null or undefined

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell The object options is specified as {} and validated via validateObject function in case of given invalid, so it cannot be null or undefined. Am I missing something?

}
const BUFFER_SIZE = 128 * 2 ** 10;
const {
start = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good default value, we should start reading from the current position instead of the start of the file by default.

@@ -739,6 +747,68 @@ async function readFile(path, options) {
return PromisePrototypeFinally(readFileHandle(fd, options), fd.close);
}

async function stream(handle, dest, options) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function does the equivalent of .pipe(), then naming it stream is going to be confusing.

@addaleax addaleax mentioned this pull request Apr 27, 2021
@Ayase-252
Copy link
Member Author

I would address reviews some time on the evening. 😄

@Ayase-252
Copy link
Member Author

Ayase-252 commented Apr 28, 2021

FWIW, Current benchmarking result on f7f6039

File Size Copy by stream.pipe (Q1) Copy by filehandle.stream (Q2) Improvement (Q2/Q1)
1 KB 4,495.32 17,491.61 3.89
1 MB 587.06 1,141.86 1.94
10 MB 69.03 87.62 1.27

The idea behind #38350 is simple. And it did give good benchmarking result (at least in my case). I think it is the value of the PR.

It could be implemented in user land easily for users whose application is performance-critical. I'm also fine to close this in favor of a more preferable way which integrates with current Node.js architecture better if this didn't so.

}

async function cleanUp(_, i) {
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you return the array instead of waiting for it, you avoid the creation of conf.n promises – instead the promises will be awaited on line 48.

Suggested change
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)];

}

async function cleanUp(_, i) {
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you return the array instead of waiting for it, you avoid the creation of conf.n unnecessary promises – instead the promises will be awaited on line 45.

Suggested change
await Promise.all([fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)]);
return [fsp.unlink(`${sourceName}-${i}`), fsp.unlink(`${destName}-${i}`)];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it makes sense.

async function stream(handle, dest, options) {
if (!options) {
options = {};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this just...

async function stream(handle, dest, options = {}) {
  validateObject(options, 'options');
  // ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!

end,
} = options;

if (start) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (start) {
if (start !== undefined) {

if (start) {
validateInteger(start, 'start', 0);
}
if (end) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (end) {
if (end !== undefined) {

Co-authored-by: Antoine du Hamel <[email protected]>
buffer = copy;
}

dest.write(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dest.write(buffer);
if (!dest.write(buffer)) {
if (dest.destroyed) return;
await EE.once(dest, 'drain');
}

Better compat with old streams.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines 783 to 785
if (dest.writableNeedDrain) {
await EE.once(dest, 'drain');
}
Copy link
Member

@ronag ronag Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (dest.writableNeedDrain) {
await EE.once(dest, 'drain');
}
if (dest.writableNeedDrain) {
if (dest.destroyed) return;
await EE.once(dest, 'drain');
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FileHandle.stream
7 participants