Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: add throws option to fs.f/l/statSync #33716

Closed
wants to merge 4 commits into from

Conversation

amcasey
Copy link
Contributor

@amcasey amcasey commented Jun 4, 2020

For consumers that aren't interested in why a statSync call failed,
allocating and throwing an exception is an unnecessary expense. This PR
adds an option that will cause it to return undefined in such cases
instead.

As a motivating example, the JavaScript & TypeScript language service
shared between Visual Studio and Visual Studio Code is stuck with
synchronous file IO for architectural and backward-compatibility
reasons. It frequently needs to speculatively check for the existence
of files and directories that may not exist (and cares about file vs
directory, so existsSync is insufficient), but ignores file system
entries it can't access, regardless of the reason.

Benchmarking the language service is difficult because it's so hard to
get good coverage of both code bases and user behaviors, but, as a
representative metric, we measured batch compilation of a few hundred
popular projects (by star count) from GitHub and found that, on average,
we saved about 1-2% of total compilation time. We speculate that the
savings could be even more significant in interactive (language service
or watch mode) scenarios, where the same (non-existent) files need to be
polled over and over again. It's not a huge improvement, but it's a
very small change and it will affect a lot of users (and CI runs).

For reference, our measurements were against v12.x (3637a06 at the
time) on an Ubuntu Server desktop with an SSD.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the fs Issues and PRs related to the fs subsystem / file system. label Jun 4, 2020
Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another way to improve performance in case you expect many errors that you are not interested in: using Error.stackTraceLimit. This may be set to zero before the operation and should be reset afterwards. It will not be the same gain as adding the option here (likely around half the performance gain) but it's possible to apply this pattern everywhere. The reason why this is a significant improvement is the overhead of computing the stack frames.

doc/api/fs.md Outdated Show resolved Hide resolved
@BridgeAR
Copy link
Member

BridgeAR commented Jun 4, 2020

@amcasey I guess you already use the withFileTypes option while using readdir()?

@amcasey
Copy link
Contributor Author

amcasey commented Jun 4, 2020

@BridgeAR Yes, discovering that was a huge win - thanks!

@amcasey
Copy link
Contributor Author

amcasey commented Jun 8, 2020

There is another way to improve performance in case you expect many errors that you are not interested in: using Error.stackTraceLimit

I tried this on the same data set I used to validate my change and saw a change that was close enough to zero that I'd want to do a lot more iterations before declaring one way or the other. (For simplicity, I just set Error.stackTraceLimit = 0 at the start of the program and left it - let me know if that somehow invalidated the experiment.)

Edit: While sanity checking that I had, in fact, run against the correct version of the code, I discovered that it subsequently sets the stackTraceLimit to 100. I'll clear that and try again.

Edit 2: The comment is wrong - the code setting stackTraceLimit to 100 didn't run - so the no-change result looks correct.

@BridgeAR
Copy link
Member

BridgeAR commented Jun 8, 2020

@amcasey it would depend if any other code resets the stack trace limit to a different value.

I just ran a quick benchmark in the REPL:

> console.time(); var limit = Error.stackTraceLimit; Error.stackTraceLimit = 0; for (let i = 0; i < 1e6; i++) try { throw new Error() } catch {}; Error.stackTraceLimit = limit; console.timeEnd()
default: 2.380s
> console.time(); for (let i = 0; i < 1e6; i++) try { throw new Error() } catch {}; console.timeEnd()
default: 7.260s

@amcasey
Copy link
Contributor Author

amcasey commented Jun 8, 2020

@BridgeAR We only have one explicit access to the property and it's not hit in the case I tested. Is it possible that some other API we're calling reset it? Also, I didn't mean to imply that it doesn't work in any scenario, just that it doesn't obviously work in ours.

@BridgeAR
Copy link
Member

BridgeAR commented Jun 9, 2020

Is it possible that some other API we're calling reset it?

That is absolutely possible. You could check what is set while exiting (e.g., process.on('exit', () => console.log(Error.stackTraceLimit))).

lib/fs.js Outdated Show resolved Hide resolved
test/parallel/test-fs-stat-bigint.js Outdated Show resolved Hide resolved
@lundibundi
Copy link
Member

ping @nodejs/fs @BridgeAR

@nodejs-github-bot
Copy link
Collaborator

@MicahZoltu
Copy link

Beyond just the operational overhead cost, debugging errors can become more difficult when code is throwing exceptions during normal operations. If you debug an application and set it to break on all exceptions, it will break every time some piece of code tries to check for file existence. For something like the TypeScript compiler, this means hundreds or thousands of expected exceptions as the compiler probes for files on disk (file not found is a very much expected result). While in some cases you can work around this by setting a breakpoint after the file probing is complete and then enabling uncaught exceptions, this strategy doesn't work when the exception you are trying to break on happens in the middle of a large amount of file probing, or when the application you are debugging just naturally does a lot of file probing.

@lundibundi
Copy link
Member

ping @BridgeAR @nodejs/fs

@bnoordhuis
Copy link
Member

Let's start with the bits I agree with. :-)

Yes, fs.statSync() is expensive when most calls throw. It's why I changed require() to use something else a long time ago.

However, rather than adding a noThrow option, why not use fs.accessSync() instead? It should be faster on Windows - and probably other platforms too - because it does less work (fewer system calls, less data conversion.)

@MicahZoltu
Copy link

Looking at the docs, that doesn't appear to address the problem of throwing being expensive, since that function still throws in all failure scenarios:

If any of the accessibility checks fail, an Error will be thrown. Otherwise, the method will return undefined.

@amcasey
Copy link
Contributor Author

amcasey commented Jul 27, 2020

accessSync also doesn't indicate file system entry type (etc), so you'd end up calling statSync immediately afterward.

@lundibundi
Copy link
Member

ping @BridgeAR @bnoordhuis @nodejs/fs

lib/fs.js Outdated Show resolved Hide resolved
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 2, 2020
@joyeecheung
Copy link
Member

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 2, 2020
@nodejs-github-bot
Copy link
Collaborator

@amcasey
Copy link
Contributor Author

amcasey commented Sep 2, 2020

@joyeecheung Can you please point me at the failure in node-test-commit-windows-fanned? I found this but it looks non-fatal:

[ RUN      ] EnvironmentTest.EnvironmentWithNoESMLoader
(node:5404) ExperimentalWarning: VM Modules is an experimental feature. This feature could change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
(node:5404) UnhandledPromiseRejectionWarning: Error: Not supported
    at vm:module(0):1:1
    at SourceTextModule.evaluate (internal/vm/module.js:224:23)
    at embedder_main_12.js:1:328
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
(node:5404) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:5404) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
[       OK ] EnvironmentTest.EnvironmentWithNoESMLoader (79 ms)

The other stack traces in the file look like Jenkins infrastructure issues (possibly as a result of something upstream I haven't identified).

@amcasey
Copy link
Contributor Author

amcasey commented Sep 2, 2020

@joyeecheung The benchmark run shows the exception that would occur if run in a version of node without this change:

13:20:21 Error: ENOENT: no such file or directory, stat '/w/bnch-comp/node/benchmark/fs/non.existent'
13:20:21     at Object.statSync (fs.js:1042:3)

Is there some way to confirm that it actually did have this change? Locally, I see that failure with "stock" node and this output with my built version:

fs/bench-statSync-failure.js statSyncType="throw" n=1000000: 83,751.81301632537
fs/bench-statSync-failure.js statSyncType="noThrow" n=1000000: 622,500.8443041202

@amcasey amcasey changed the title lib: add noThrow option to fs.f/l/statSync lib: add throws option to fs.f/l/statSync Sep 2, 2020
codebytere added a commit that referenced this pull request Nov 22, 2020
Notable changes:

dns:
  * (SEMVER-MINOR) add a cancel() method to the promise Resolver (Szymon Marczak) #33099
events:
  * (SEMVER-MINOR) add max listener warning for EventTarget (James M Snell) #36001
http:
  * (SEMVER-MINOR) add support for abortsignal to http.request (Benjamin Gruenbaum) #36048
http2:
  * (SEMVER-MINOR) allow setting the local window size of a session (Yongsheng Zhang) #35978
lib:
  * (SEMVER-MINOR) add throws option to fs.f/l/statSync (Andrew Casey) #33716
path:
  * (SEMVER-MINOR) add `path/posix` and `path/win32` alias modules (ExE Boss) #34962
readline:
  * (SEMVER-MINOR) add getPrompt to get the current prompt (Mattias Runge-Broberg) #33675
src:
  * (SEMVER-MINOR) add loop idle time in diagnostic report (Gireesh Punathil) #35940
util:
  * (SEMVER-MINOR) add `util/types` alias module (ExE Boss) #34055

PR-URL: TODO
@codebytere codebytere mentioned this pull request Nov 22, 2020
amcasey added a commit to microsoft/TypeScript that referenced this pull request Nov 23, 2020
Future versions of node will be able to return undefined, rather than
allocating and throwing an exception, when a file is not found.

See nodejs/node#33716
codebytere added a commit that referenced this pull request Nov 24, 2020
Notable changes:

dns:
  * (SEMVER-MINOR) add a cancel() method to the promise Resolver (Szymon Marczak) #33099
events:
  * (SEMVER-MINOR) add max listener warning for EventTarget (James M Snell) #36001
http:
  * (SEMVER-MINOR) add support for abortsignal to http.request (Benjamin Gruenbaum) #36048
http2:
  * (SEMVER-MINOR) allow setting the local window size of a session (Yongsheng Zhang) #35978
lib:
  * (SEMVER-MINOR) add throws option to fs.f/l/statSync (Andrew Casey) #33716
path:
  * (SEMVER-MINOR) add `path/posix` and `path/win32` alias modules (ExE Boss) #34962
readline:
  * (SEMVER-MINOR) add getPrompt to get the current prompt (Mattias Runge-Broberg) #33675
src:
  * (SEMVER-MINOR) add loop idle time in diagnostic report (Gireesh Punathil) #35940
util:
  * (SEMVER-MINOR) add `util/types` alias module (ExE Boss) #34055

PR-URL: TODO
codebytere added a commit that referenced this pull request Nov 24, 2020
Notable changes:

dns:
  * (SEMVER-MINOR) add a cancel() method to the promise Resolver (Szymon Marczak) #33099
events:
  * (SEMVER-MINOR) add max listener warning for EventTarget (James M Snell) #36001
http:
  * (SEMVER-MINOR) add support for abortsignal to http.request (Benjamin Gruenbaum) #36048
http2:
  * (SEMVER-MINOR) allow setting the local window size of a session (Yongsheng Zhang) #35978
lib:
  * (SEMVER-MINOR) add throws option to fs.f/l/statSync (Andrew Casey) #33716
path:
  * (SEMVER-MINOR) add `path/posix` and `path/win32` alias modules (ExE Boss) #34962
readline:
  * (SEMVER-MINOR) add getPrompt to get the current prompt (Mattias Runge-Broberg) #33675
src:
  * (SEMVER-MINOR) add loop idle time in diagnostic report (Gireesh Punathil) #35940
util:
  * (SEMVER-MINOR) add `util/types` alias module (ExE Boss) #34055

PR-URL: #36232
codebytere added a commit that referenced this pull request Nov 24, 2020
Notable changes:

dns:
  * (SEMVER-MINOR) add a cancel() method to the promise Resolver (Szymon Marczak) #33099
events:
  * (SEMVER-MINOR) add max listener warning for EventTarget (James M Snell) #36001
http:
  * (SEMVER-MINOR) add support for abortsignal to http.request (Benjamin Gruenbaum) #36048
http2:
  * (SEMVER-MINOR) allow setting the local window size of a session (Yongsheng Zhang) #35978
lib:
  * (SEMVER-MINOR) add throws option to fs.f/l/statSync (Andrew Casey) #33716
path:
  * (SEMVER-MINOR) add `path/posix` and `path/win32` alias modules (ExE Boss) #34962
readline:
  * (SEMVER-MINOR) add getPrompt to get the current prompt (Mattias Runge-Broberg) #33675
src:
  * (SEMVER-MINOR) add loop idle time in diagnostic report (Gireesh Punathil) #35940
util:
  * (SEMVER-MINOR) add `util/types` alias module (ExE Boss) #34055

PR-URL: #36232
Comment on lines +2571 to +2573
* `throwIfNoEntry` {boolean} Whether an exception will be thrown
if no file system entry exists, rather than returning `undefined`.
**Default:** `true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the history be updated to reflect when this was added please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merceyz Sorry, I'm not familiar with that process. Is there a changelog or something that needs to be updated? Can you point me at a sample or a doc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the history a few lines above this is where it's added but I'm also not familiar with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I didn't see that. I guess I'm hoping it's generated, because I don't know how I'd know what version to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't i'm afraid cc @MylesBorins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the history items above (e.g. 7b5b8ef) make it look like the changelog was updated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So you'd like a new PR with Added in: v15.3.0 in each of the new sections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please :)

@amcasey amcasey deleted the NoThrowStat branch December 11, 2020 16:44
aduh95 pushed a commit to aduh95/node that referenced this pull request Jan 14, 2021
For consumers that aren't interested in *why* a `statSync` call failed,
allocating and throwing an exception is an unnecessary expense.  This PR
adds an option that will cause it to return `undefined` in such cases
instead.

As a motivating example, the JavaScript & TypeScript language service
shared between Visual Studio and Visual Studio Code is stuck with
synchronous file IO for architectural and backward-compatibility
reasons.  It frequently needs to speculatively check for the existence
of files and directories that may not exist (and cares about file vs
directory, so `existsSync` is insufficient), but ignores file system
entries it can't access, regardless of the reason.

Benchmarking the language service is difficult because it's so hard to
get good coverage of both code bases and user behaviors, but, as a
representative metric, we measured batch compilation of a few hundred
popular projects (by star count) from GitHub and found that, on average,
we saved about 1-2% of total compilation time.  We speculate that the
savings could be even more significant in interactive (language service
or watch mode) scenarios, where the same (non-existent) files need to be
polled over and over again.  It's not a huge improvement, but it's a
very small change and it will affect a lot of users (and CI runs).

For reference, our measurements were against `v12.x` (3637a06 at the
time) on an Ubuntu Server desktop with an SSD.

PR-URL: nodejs#33716
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
targos pushed a commit that referenced this pull request Apr 26, 2021
For consumers that aren't interested in *why* a `statSync` call failed,
allocating and throwing an exception is an unnecessary expense.  This PR
adds an option that will cause it to return `undefined` in such cases
instead.

As a motivating example, the JavaScript & TypeScript language service
shared between Visual Studio and Visual Studio Code is stuck with
synchronous file IO for architectural and backward-compatibility
reasons.  It frequently needs to speculatively check for the existence
of files and directories that may not exist (and cares about file vs
directory, so `existsSync` is insufficient), but ignores file system
entries it can't access, regardless of the reason.

Benchmarking the language service is difficult because it's so hard to
get good coverage of both code bases and user behaviors, but, as a
representative metric, we measured batch compilation of a few hundred
popular projects (by star count) from GitHub and found that, on average,
we saved about 1-2% of total compilation time.  We speculate that the
savings could be even more significant in interactive (language service
or watch mode) scenarios, where the same (non-existent) files need to be
polled over and over again.  It's not a huge improvement, but it's a
very small change and it will affect a lot of users (and CI runs).

For reference, our measurements were against `v12.x` (3637a06 at the
time) on an Ubuntu Server desktop with an SSD.

PR-URL: #33716
Backport-PR-URL: #36921
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
@danielleadams danielleadams mentioned this pull request May 3, 2021
DanielRosenwasser pushed a commit to microsoft/TypeScript that referenced this pull request Jun 14, 2021
Future versions of node will be able to return undefined, rather than
allocating and throwing an exception, when a file is not found.

See nodejs/node#33716
DanielRosenwasser pushed a commit to microsoft/TypeScript that referenced this pull request Jun 14, 2021
Future versions of node will be able to return undefined, rather than
allocating and throwing an exception, when a file is not found.

See nodejs/node#33716
DanielRosenwasser pushed a commit to microsoft/TypeScript that referenced this pull request Jun 14, 2021
Future versions of node will be able to return undefined, rather than
allocating and throwing an exception, when a file is not found.

See nodejs/node#33716
DanielRosenwasser added a commit to microsoft/TypeScript that referenced this pull request Jun 15, 2021
Future versions of node will be able to return undefined, rather than
allocating and throwing an exception, when a file is not found.

See nodejs/node#33716

Co-authored-by: Andrew Casey <[email protected]>
DanielRosenwasser added a commit to microsoft/TypeScript that referenced this pull request Jun 15, 2021
Future versions of node will be able to return undefined, rather than
allocating and throwing an exception, when a file is not found.

See nodejs/node#33716

Co-authored-by: Andrew Casey <[email protected]>
DanielRosenwasser added a commit to microsoft/TypeScript that referenced this pull request Jun 15, 2021
Future versions of node will be able to return undefined, rather than
allocating and throwing an exception, when a file is not found.

See nodejs/node#33716

Co-authored-by: Andrew Casey <[email protected]>
aduh95 added a commit to aduh95/node that referenced this pull request Sep 1, 2021
nodejs-github-bot pushed a commit that referenced this pull request Sep 6, 2021
Refs: #33716
Refs: #35993
Refs: #35911

PR-URL: #39972
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
nodejs-github-bot pushed a commit that referenced this pull request Sep 6, 2021
PR-URL: #39972
Refs: #33716
Refs: #35993
Refs: #35911
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
BethGriggs pushed a commit that referenced this pull request Sep 21, 2021
Refs: #33716
Refs: #35993
Refs: #35911

PR-URL: #39972
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.