Skip to content

Conversation

@H4ad
Copy link

@H4ad H4ad commented Sep 24, 2025

This PR basically lazy loads almost all the imports.

The image below shows how much time it spends just to require this package.

image

With this change, it went down from 52ms to 7ms:

image
Further optimizations

We can reduce even more the cost to just 1ms if we change the lib/index.js to something like this:

const { get } = require('./fetcher.js')

const tarball = (spec, opts) => get(spec, opts).tarball()
tarball.stream = (spec, handler, opts) => get(spec, opts).tarballStream(handler)
tarball.file = (spec, dest, opts) => get(spec, opts).tarballFile(dest)

const myExports = {
  resolve: (spec, opts) => get(spec, opts).resolve(),
  extract: (spec, dest, opts) => get(spec, opts).extract(dest),
  manifest: (spec, opts) => get(spec, opts).manifest(),
  packument: (spec, opts) => get(spec, opts).packument(),
  tarball,
}

Object.defineProperty(myExports, 'GitFetcher', {
  get: () => require('./git.js'),
  enumerable: true,
  configurable: false,
})

Object.defineProperty(myExports, 'DirFetcher', {
  get: () => require('./dir.js'),
  enumerable: true,
  configurable: false,
})

Object.defineProperty(myExports, 'FileFetcher', {
  get: () => require('./file.js'),
  enumerable: true,
  configurable: false,
})

Object.defineProperty(myExports, 'RemoteFetcher', {
  get: () => require('./remote.js'),
  enumerable: true,
  configurable: false,
})

Object.defineProperty(myExports, 'RegistryFetcher', {
  get: () => require('./registry.js'),
  enumerable: true,
  configurable: false,
})

module.exports = myExports

But this change a little bit the behavior of index.js and the exports, so it's up to you to decide if it's worthy.

This will reduce about 45ms when running npx since it import this package but don't use it.

**Draft: In draft since I'm investing why this change broke the npm i -g npm

@H4ad H4ad requested a review from a team as a code owner September 24, 2025 01:52
@H4ad H4ad marked this pull request as draft September 24, 2025 01:57
@H4ad
Copy link
Author

H4ad commented Sep 24, 2025

I'm getting the following error when running npm i -g npm:

npm error code MODULE_NOT_FOUND
npm error Cannot find module 'cacache'
npm error Require stack:
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/node_modules/pacote/lib/lazy.js
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/node_modules/pacote/lib/fetcher.js
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/node_modules/pacote/lib/index.js
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/lib/commands/install.js
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/lib/npm.js
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/lib/cli/entry.js
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/lib/cli.js
npm error - /home/h4ad/.nvm/versions/node/v24.8.0/lib/node_modules/npm/bin/npm-cli.js

Hey @wraithgar, you once told me that this could happen but I have no clue why, would you mind giving some hints on what I should do in those cases?

@wraithgar
Copy link
Member

wraithgar commented Sep 24, 2025

@H4ad it means we can't lazy load those modules. npm is a special unicorn in that it needs to have all modules loaded beforehand that it will need to install itself. They have to be in the require cache because we are about to remove the old npm so we can install the new npm.

One of the smoke tests we have in npm itself is a check for this specifically.

@H4ad
Copy link
Author

H4ad commented Sep 24, 2025

Would be crazy to have a list of all deps that we need to install itself to warmup the cache before doing the install global? In this way, we could lazy load everything that we want without worrying it would not be in the cache.

It will probably slowdown the installation of itself for a few ms but will save some time during normal operations.

@wraithgar
Copy link
Member

Would be crazy to have a list of all deps that we need to install itself to warmup the cache

Yeah that's a tall order, and it would be even harder to make it automatic. Sure we could manually find out but if it changes in one of the many subdependencies then all of a sudden we're broken again.

@H4ad
Copy link
Author

H4ad commented Sep 24, 2025

it would be even harder to make it automatic

Can't this be solved using the same principle as npm ls --all --json --only=prod? Then we can add a CI to run whenever we update package.json to update the list of dependencies that need to be warmed up. We don't need to worry about which one is being used, as we could put all the dependencies in that list since this command is not used often.

@wraithgar
Copy link
Member

Then we can add a CI to run whenever we update package.json

In theory yes this would work w/ the cli. We already have a dependencies lifecycle script that updates some other things.

@wraithgar
Copy link
Member

Just realized this will NOT work from a centralized standpoint because we can not force subdependencies to load any unhoisted deps.

So if the dependency tree looked like:

graph LR;
  npm-->pacote;
  npm-->other-1["[email protected]"];
  pacote-->other-2["[email protected]"];
Loading

We could tell npm to require('other') but that would not preload the code that pacote would need when it runs require('other').

We'd somehow have to have a mechanism for loading all subdependencies.

Consider a tree w/ semver@6 and normalize-package-data@8 in its root. semver is in two places:

~/D/s/a $ npm ls semver
[email protected] /Users/wraithgar/Development/scratch/a
├─┬ [email protected]
│ └── [email protected]
└── [email protected]

If I require('semver') and require('normalize-package-data') the cache ends up with:

> Object.keys(require.cache).filter(k => k.includes('semver'))
[
  '/Users/wraithgar/Development/scratch/a/node_modules/semver/semver.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/functions/valid.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/functions/parse.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/classes/semver.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/debug.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/constants.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/re.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/parse-options.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/identifiers.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/functions/clean.js'
]

Notice how we have more than just the root exports to worry about. We have an even deeper problem here: normalize-package-data requires files directly. We'd have to start iterating through all its non-hoisted deps and requiring them directly.

require('./node_modules/normalize-package-data/node_modules/semver')
> Object.keys(require.cache).filter(k => k.includes('semver'))
[
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/index.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/re.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/constants.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/debug.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/classes/semver.js',
  '/Users/wraithgar/Development/scratch/a/node_modules/normalize-package-data/node_modules/semver/internal/parse-options.js',
...
]

This would mean having to know which of those deps were hoisted or not. Pretty complex tree walking just to solve this.

I realize this feels discouraging, but unfortunately it's just something npm has to deal with. Every other package out there gets to use npm to install itself, and thus does not have to worry about this.

@H4ad
Copy link
Author

H4ad commented Sep 26, 2025

One possible solution would be deduplication, but that would create friction when updating dependencies.

Well, I think there might be a solution, but it might not be worth the complexity of handling something so delicate.

One of my next strategies would be to try adding multiple exports for the same package, which could also make the lazy-loading task even more complex.

Thanks for the detailed explanation, I will keep up trying to find other optimizations that does not involve lazy loading for npm install.

@H4ad H4ad closed this Sep 26, 2025
@wraithgar
Copy link
Member

One possible solution would be deduplication

Oh boy. Deduplication is always something we pay attention to during deps updates, and we also try to make sure the production dep is the one that gets hoisted since that's the one that gets bundled. npm/cli#8576 for example took a LOT of that. Most folks don't notice it haha.

Unfortunately sometimes we do have to ship with duplicated dependencies, even if only temporarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants