Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance while loading date-fns #75

Closed
nwalters512 opened this issue Jul 11, 2023 · 2 comments · Fixed by #76
Closed

Poor performance while loading date-fns #75

nwalters512 opened this issue Jul 11, 2023 · 2 comments · Fixed by #76

Comments

@nwalters512
Copy link
Contributor

I recently discovered that this package (via @opentelemetry/*) is causing significant slowdowns when loading packages that internally require a large number of files. I used date-fns as an example, as it's easy to reproduce the issue with it, but I believe it occurs for any package that internally uses a large number of files, or that loads the same files over and over again from other files.

Reproduction

In the root of this repository, install date-fns:

npm install date-fns

Create the following slow.js file in the root of this repository:

const start = Date.now();
require('date-fns');
console.log(`loaded date-fns in ${Date.now() - start}ms`);

Run this script with node slow.js and observe how long it takes to load; on my machine, it's ~100ms.

Now, add a no-op require hook that doesn't filter modules:

const { Hook } = require('./index');

new Hook(null, function (exports, name, basedir) {
  return exports;
});

const start = Date.now();
require('date-fns');
console.log(`loaded date-fns in ${Date.now() - start}ms`);

Run the script again, and observe that the runtime has more than tripled (it's ~330ms on my machine).

Now, add more hooks to simulate a worst-case scenario like that described in open-telemetry/opentelemetry-js-contrib#1566, where you inadvertently end up with many hooks installed:

const { Hook } = require('./index');

new Hook(null, function (exports, name, basedir) {
  return exports;
});
new Hook(null, function (exports, name, basedir) {
  return exports;
});
new Hook(null, function (exports, name, basedir) {
  return exports;
});
new Hook(null, function (exports, name, basedir) {
  return exports;
});
new Hook(null, function (exports, name, basedir) {
  return exports;
});
new Hook(null, function (exports, name, basedir) {
  return exports;
});
new Hook(null, function (exports, name, basedir) {
  return exports;
});

const start = Date.now();
require('date-fns');
console.log(`loaded date-fns in ${Date.now() - start}ms`);

Observe that the runtime has increased yet again, this time to ~1.7 seconds, just to load a single module!

Potential fix

I noticed that there's an cache for resolved modules. However, it's unused in two cases that date-fns seems to hit frequently:

I added self._cache.set(filename, exports, core) before each of those return exports statements, and with 7 hooks installed, that brought the time to load date-fns down from ~1.7 seconds to just 338ms! This still isn't nearly as fast as loading it without the require-in-the-middle hooks, but it's a significant improvement.

I don't know how safe this is; maybe there's a good reason the exports cache isn't used in these code paths! However, if y'all think this would be safe, I'd be happy to open a PR with this improvement.

@trentm
Copy link
Contributor

trentm commented Jul 11, 2023

@nwalters512 Thanks for the excellent issue!
tl;dr: Yes, please on your PR. I think your suggestions look good. I can't think of an issue that would caused by caching those exports.

Here is some playing of mine. First my "slow.js":

const { Hook } = require('./'); // require-in-the-middle

const N = 1
for (var i = 0; i < N; i++) {
  // How @opentelemetry/instrumentation uses RITM:
  const moduleNames = null
  const opts = { internals: true }

  // // How elastic-apm-node uses RITM:
  // const moduleNames = ['http', 'https', 'ioreds' /* ... */]
  // const opts = {}

  // // Nathan
  // const moduleNames = null
  // const opts = {}

  new Hook(moduleNames, opts, function (exports, name, basedir) {
    return exports;
  });
}

const start = Date.now();
require('date-fns');
console.error(`loaded date-fns in ${Date.now() - start}ms`);

baseline

As a baseline, here are a few runs without a Hook (i.e. N = 0):

% for i in 1 2 3 4 5; do; node slow.js; done
loaded date-fns in 208ms
loaded date-fns in 189ms
loaded date-fns in 201ms
loaded date-fns in 192ms
loaded date-fns in 200ms

internals=true

A difference from your runs is that I'm setting opts = { internals: true }, which is what @opentelemetry/instrumentation uses. Using that option makes the Hook a little faster -- compared to opts = {}.

With N=1 hook, no opts, as you were running:

% for i in 1 2 3 4 5; do; node slow.js; done
loaded date-fns in 1036ms
loaded date-fns in 984ms
loaded date-fns in 991ms
loaded date-fns in 1012ms
loaded date-fns in 909ms

With N=1 hook, opts = { internals: true } as OTel uses:

% for i in 1 2 3 4 5; do; node slow.js; done
loaded date-fns in 809ms
loaded date-fns in 771ms
loaded date-fns in 749ms
loaded date-fns in 662ms
loaded date-fns in 727ms

Your patch is still very effective for both cases.

With N=1 hook, no opts, and your patch (see below):

% for i in 1 2 3 4 5; do; node slow.js; done
loaded date-fns in 312ms
loaded date-fns in 342ms
loaded date-fns in 336ms
loaded date-fns in 395ms
loaded date-fns in 331ms

With N=1 hook, opts = { internals: true }, and your patch:

% for i in 1 2 3 4 5; do; node slow.js; done
loaded date-fns in 372ms
loaded date-fns in 367ms
loaded date-fns in 376ms
loaded date-fns in 384ms
loaded date-fns in 373ms

Here I'm assuming your patch is something like:

diff --git a/index.js b/index.js
index bc1fddf..b0ad25b 100644
--- a/index.js
+++ b/index.js
@@ -236,6 +237,7 @@ function Hook (modules, options, onrequire) {
           res = resolve.sync(moduleName, { basedir })
         } catch (e) {
           debug('could not resolve module: %s', moduleName)
+          self._cache.set(filename, exports, core) // XXX
           return exports // abort if module could not be resolved (e.g. no main in package.json and no index.js file)
         }

@@ -247,6 +249,7 @@ function Hook (modules, options, onrequire) {
             debug('preparing to process require of internal file: %s', moduleName)
           } else {
             debug('ignoring require of non-main module file: %s', res)
+            self._cache.set(filename, exports, core) // XXX
             return exports // abort if not main module file
           }
         }

@nwalters512
Copy link
Contributor Author

nwalters512 commented Jul 12, 2023

@trentm thanks for taking such a quick look! Yep, that's exactly what my patch looks like. I'll open a PR shortly. Good note about internals: true, though it's nice to see that this patch will still have an improvement in that case!

EDIT: PR opened! #76

@trentm trentm linked a pull request Jul 12, 2023 that will close this issue
trentm pushed a commit that referenced this issue Jul 12, 2023
This can be a big perf win when using the Hook *without* the modules argument,
e.g.: `new Hook(null, onrequire)`.

Fixes: #75
@trentm trentm mentioned this issue Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants