Clear the internal require `statCache` on an unsuccessful module load? #31803

benjamingr · 2020-02-14T22:12:39Z

Currently, it is impossible to write code like:

let Bluebird;
try { 
  Bluebird = require('bluebird');
} catch (e) { 
  execSync('npm install bluebird');
  Bluebird = require('bluebird');
}

Because the cjs loader caches the results of stat calls internally so it would not look for bluebird the second time we require it.

It would be pretty easy to deal with this use case from a technical point of view by setting statCache to a new Map() on a MODULE_NOT_FOUND.

I can see several use cases like:

Writing repls in userland
Writing a "lazy loading" require hook that recovers from 'module not found' errors by npm install`

Would people be in favor of this change and enabling this?

cc @nodejs/modules

cc @BridgeAR because of #26928 that enables this specifically for the REPL.

The text was updated successfully, but these errors were encountered:

BridgeAR · 2020-02-14T22:20:36Z

That approach sounds fine to me. It would hurt the loading percormance of all following calls though. I'll check if it is possible to isolate the failed modules stats to only clear those.

ljharb · 2020-02-14T22:55:19Z

How would this be similar to ESM behavior for a similar pattern?

mscdex · 2020-02-14T23:55:27Z

Perhaps there should be an explicit API to do this instead of always doing it, so that you opt-in to any potential performance issues.

weswigham · 2020-02-15T00:32:33Z

Rather than clearing the cache on failure (which means no longer caching failed lookups which makes repeated failed lookups very expensive), why not clear the cache on invocation of a process, fs, or child_process API (as those are liable to trigger an fs write (or just mark for some kind of lazy re-validation on next attempt?))? (In addition to providing a sort of manual clear opt-in?)

How would this be similar to ESM behavior for a similar pattern?

So for an esm equivalent:

async function main() {
  let Bluebird;
  try { 
    Bluebird = await import('bluebird');
  } catch (e) { 
    execSync('npm install bluebird');
    Bluebird = await import('bluebird');
  } 
}
main();

since each dynamic import invocation is a unique sub-module-graph, the second invocation of import is completely unrelated to the first and uses a separate cache. This isn't really tenable for cjs, and only works for esm because of the upfront graph traversal (which allows everything in the graph to share a cache). This does mean that if the community starts to rely heavily on dynamic import a lot, performance might eventually be bad, since things aren't cached between imports, but... eh?

devsnek · 2020-02-15T00:55:36Z

since each dynamic import invocation is a unique sub-module-graph, the second invocation of import is completely unrelated to the first and uses a separate cache.

Just to be explicit, import() is required by spec to cache if it succeeds (on a per-module basis, and node actually has a stricter behavior because it caches across all modules). If it fails, it is not required to cache the failure, and node doesn't cache the failure.

mscdex · 2020-02-15T04:43:26Z

Rather than clearing the cache on failure (which means no longer caching failed lookups which makes repeated failed lookups very expensive), why not clear the cache on invocation of a process, fs, or child_process API

-1 That's way too magical

mcollina · 2020-02-15T07:05:35Z

I’m -1 of clearing the cache. A lot of modules use a “conditional loading pattern” and this change is going to slow their startup time significantly.

bnoordhuis · 2020-02-15T11:05:16Z

Apart from technical considerations, the example introduces a kind of non-determinism (the install can fail - or not) that shouldn't be encouraged. The time for installing packages is at install time, not run time.

targos · 2020-02-15T12:34:14Z

Does require.resolve have the same restriction? If not, it can be used to check if the module is there before trying to require it

benjamingr · 2020-02-15T16:33:20Z

@targos

Does require.resolve have the same restriction?

It does, it goes through Module._findPath and uses the stat cache.

@bnoordhuis

The time for installing packages is at install time, not run time.

Package installation is just one use case - the only reason Node works this way at the moment is because we have caching for performance reasons and that is observable. Moreover we have already worked around it in the REPL.

@mcollina

Hmm, if it only invalidated the entries it created for the problematically required module rather than the whole cache on an error would it help?

@mscdex

Perhaps there should be an explicit API to do this instead of always doing it, so that you opt-in to any potential performance issues.

I tend to agree though I believe the current behavior is only because our performance optimization leaks a behavioral change.

ljharb · 2020-02-15T16:40:35Z

Your OP only has a use case for runtime install, or a repl; are there others you have in mind?

DerekNonGeneric · 2020-02-15T17:47:58Z

If you know that the specifier and package name match, as does in OP, you could use an alternative method (not resolve) of checking if the package exists.

// index.js

const {accessSync} = require('fs');
const {execSync} = require('child_process');
const node_modules = './node_modules/';
const specifier = 'bluebird';

let Bluebird;
try {
  // does package exist?
  accessSync(node_modules + specifier);
  // yes, so require it
  Bluebird = require(specifier);
} catch (e) {
  // no, so install it...
  execSync('npm install ' + specifier);
  // ...and require it
  Bluebird = require(specifier);
}

console.log(Bluebird);

mcollina · 2020-02-15T23:07:41Z

Hmm, if it only invalidated the entries it created for the problematically required module rather than the whole cache on an error would it help?

It would not cause significant performance issues in that case.

LMojica · 2020-02-20T01:01:15Z

This problem also affects the case where the entire node_modules directory is created within node. In this case, there doesn't seem to be any workaround.

If you do something like what OP is doing:

let Bluebird;
try { 
  Bluebird = require('bluebird');
} catch (e) { 
  execSync('npm install bluebird');
  Bluebird = require('bluebird');
}

but you do it when there is no node_modules directory in your cwd, running execSync('npm install bluebird'); will create the directory and install the module, but the statCache will already have cached that path as non-existent. The second call to require('bluebird'); will check the statCache, see that that path is already in the cache, and won't bother checking the node_modules directory.

ljharb · 2020-02-20T03:06:48Z

What’s the use case for “npm install hasn’t been ran first”? Can you elaborate?

devsnek · 2020-02-20T03:13:23Z

I think in a really general sense, these caches shouldn't be updated for failure cases (obviously easier to say than implement, but I think that's what we should aim for)

bnoordhuis · 2020-02-20T09:10:42Z

That pessimizes a pattern that is fairly common in larger applications:

try {
  var debug = require('debug');  // just an example, of course
} catch (e) {
  var debug = () => {};
}

The bigger the program, the more occurrences of that pattern pop up, and the more it is punished if the results aren't in the stat cache.

ljharb · 2020-02-20T16:09:38Z

Is that pattern often done for the same package a great many times in the graph, and also not at module level/require time?

bnoordhuis · 2020-02-20T20:28:20Z

Yes and yes (I think - I'm not completely sure what you mean by "also not at module level/require time".)

ljharb · 2020-02-20T21:38:48Z

Like, is that done inside function bodies, as opposed to at the top level of modules (where it'd happen once and be cached after that, and thus the perf hit wouldn't be significant)

bnoordhuis · 2020-02-20T22:15:43Z

Ah, like that. I think it only matters at the top level because the stat cache is cleared when control returns to the event loop.

I'm currently looking at a midsize application that has two dozen instances of the above pattern (that I can find) on a total of 1,200 require() calls, so about 2% of the total.

An inexact science benchmark suggests not caching the result makes such require() calls 2.8x slower. That's not as bad as I feared.

There's a caveat though: require() searches upwards, therefore deeper directory structures are penalized more heavily.

ljharb · 2020-02-20T23:12:28Z

In that case, someone needing to do a conditional require in this case can already do it asynchronously, just not synchronously?

benjamingr · 2020-02-22T01:28:56Z

What if we only retry failed attempts without the statCache? That way we would only pay the penalty of not going through the cache failed attempts.

Fixes: nodejs#31803 PR-URL: nodejs#31921

benjamingr · 2020-05-30T11:56:50Z

I am going to go ahead and close this since it staled, if anyone feels strongly about supporting this - please feel free to reopen!

benjamingr added the module Issues and PRs related to the module subsystem. label Feb 14, 2020

benjamingr mentioned this issue Feb 23, 2020

module: recover on statCache failure #31921

Closed

4 tasks

benjamingr pushed a commit to benjamingr/io.js that referenced this issue Feb 23, 2020

module: recover on statCache failure

3126c50

Fixes: nodejs#31803 PR-URL: nodejs#31921

benjamingr closed this as completed May 30, 2020

richardlau mentioned this issue Jul 28, 2020

Installing NPM packages with child_process nodejs/help#2849

Closed

cdata mentioned this issue Aug 5, 2020

On demand dependencies google/tachometer#177

Merged

benjamingr mentioned this issue Sep 17, 2022

Requiring a dynamically added dependency fails when the package has no index.js file #44663

Open

Klaus-Keller mentioned this issue Aug 8, 2023

BUG - Modules not loaded, even after installation SAP/open-ux-tools#1153

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear the internal require `statCache` on an unsuccessful module load? #31803

Clear the internal require `statCache` on an unsuccessful module load? #31803

benjamingr commented Feb 14, 2020

BridgeAR commented Feb 14, 2020

ljharb commented Feb 14, 2020

mscdex commented Feb 14, 2020

weswigham commented Feb 15, 2020

devsnek commented Feb 15, 2020

mscdex commented Feb 15, 2020

mcollina commented Feb 15, 2020

bnoordhuis commented Feb 15, 2020

targos commented Feb 15, 2020

benjamingr commented Feb 15, 2020

ljharb commented Feb 15, 2020

DerekNonGeneric commented Feb 15, 2020

mcollina commented Feb 15, 2020

LMojica commented Feb 20, 2020 •

edited

Loading

ljharb commented Feb 20, 2020

devsnek commented Feb 20, 2020

bnoordhuis commented Feb 20, 2020

ljharb commented Feb 20, 2020

bnoordhuis commented Feb 20, 2020

ljharb commented Feb 20, 2020

bnoordhuis commented Feb 20, 2020

ljharb commented Feb 20, 2020

benjamingr commented Feb 22, 2020

benjamingr commented May 30, 2020

Clear the internal require statCache on an unsuccessful module load? #31803

Clear the internal require statCache on an unsuccessful module load? #31803

Comments

benjamingr commented Feb 14, 2020

BridgeAR commented Feb 14, 2020

ljharb commented Feb 14, 2020

mscdex commented Feb 14, 2020

weswigham commented Feb 15, 2020

devsnek commented Feb 15, 2020

mscdex commented Feb 15, 2020

mcollina commented Feb 15, 2020

bnoordhuis commented Feb 15, 2020

targos commented Feb 15, 2020

benjamingr commented Feb 15, 2020

ljharb commented Feb 15, 2020

DerekNonGeneric commented Feb 15, 2020

mcollina commented Feb 15, 2020

LMojica commented Feb 20, 2020 • edited Loading

ljharb commented Feb 20, 2020

devsnek commented Feb 20, 2020

bnoordhuis commented Feb 20, 2020

ljharb commented Feb 20, 2020

bnoordhuis commented Feb 20, 2020

ljharb commented Feb 20, 2020

bnoordhuis commented Feb 20, 2020

ljharb commented Feb 20, 2020

benjamingr commented Feb 22, 2020

benjamingr commented May 30, 2020

Clear the internal require `statCache` on an unsuccessful module load? #31803

Clear the internal require `statCache` on an unsuccessful module load? #31803

LMojica commented Feb 20, 2020 •

edited

Loading