Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up shimmer by about 50x #4633

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

speed up shimmer by about 50x #4633

wants to merge 1 commit into from

Conversation

bengl
Copy link
Collaborator

@bengl bengl commented Aug 27, 2024

The copyProperties function was doing an Object.setPrototypeOf, which is rather costly. We don't actually need this because all it was doing was acting as a stopgap in case we don't get all the properties from the original function. We're using Reflect.ownKeys() to get those properties, so this can't happen, therefore there's no need to set the prototype.

On a benchmark I whipped up separately for another project, I had noticed that shimmer + our instrumentations adds a significant amount of overhead. On a run with 100 iterations, I found that overhead to be about 10000%. When I ran the same benchmark after this change, I saw that overhead to be about 200%. To be fair, this benchmark is not a realistic application loading, and is explicitly clearing the cache and re-loading everything repeatedly, so I'm not sure how impactful this change will be. That said, we'll see what the benchmarks say.

The `copyProperties` function was doing an `Object.setPrototypeOf`,
which is rather costly. We don't actually need this because all it was
doing was acting as a stopgap in case we don't get all the properties
from the original function. We're using `Reflect.ownKeys()` to get those
properties, so this can't happen, therefore there's no need to set the
prototype.

On a benchmark I whipped up separately for another project, I had
noticed that shimmer + our instrumentations adds a significant amount of
overhead. On a run with 100 iterations, I found that overhead to be
about 10000%. When I ran the same benchmark after this change, I saw
that overhead to be about 200%. To be fair, this benchmark is not a
realistic application loading, and is explicitly clearing the cache and
re-loading everything repeatedly, so I'm not sure how impactful this
change will be. That said, we'll see what the benchmarks say.
@bengl bengl requested a review from a team as a code owner August 27, 2024 18:19
Copy link

Overall package size

Self size: 6.98 MB
Deduped: 58.21 MB
No deduping: 58.49 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/native-appsec | 8.0.1 | 15.59 MB | 15.6 MB | | @datadog/native-iast-taint-tracking | 3.1.0 | 12.27 MB | 12.28 MB | | @datadog/pprof | 5.3.0 | 9.85 MB | 10.22 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.4.1 | 2.14 MB | 2.23 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 2.0.0 | 898.77 kB | 1.3 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.8.1 | 71.67 kB | 785.15 kB | | msgpack-lite | 0.1.26 | 201.16 kB | 281.59 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | lru-cache | 7.14.0 | 74.95 kB | 74.95 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | int64-buffer | 0.1.10 | 49.18 kB | 49.18 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | path-to-regexp | 0.1.7 | 6.78 kB | 6.78 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link

pr-commenter bot commented Aug 27, 2024

Benchmarks

Benchmark execution time: 2024-08-27 18:25:44

Comparing candidate commit d0a1c31 in PR branch bengl/faster-shimmer with baseline commit 1f2914f in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 260 metrics, 6 unstable metrics.

Copy link

codecov bot commented Aug 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.13%. Comparing base (1f2914f) to head (d0a1c31).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4633      +/-   ##
==========================================
+ Coverage   86.40%   92.13%   +5.72%     
==========================================
  Files          15      102      +87     
  Lines         581     3292    +2711     
  Branches       33       33              
==========================================
+ Hits          502     3033    +2531     
- Misses         79      259     +180     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rochdev
Copy link
Member

rochdev commented Aug 27, 2024

I think this was mostly added as a safety net, but I don't know the scenarios where the prototype would not be accessible by getOwnPropertyDescriptors. However, I cannot reproduce this performance gain at all, and running quick benchmark locally result in the exact same timing. My assumption is that there is something specific in your setup where maybe the wrapping stays in place and gets reapplied on top of the previous wrapping or something similar, which would explain the exponential impact of running the benchmark longer. I think it would be worth it to find exactly why this is happening just in case there is some other bug that this either this is fixing or that would be present in the benchmark. In any case, I don't think it's needed, so it should be safe to remove.

@bengl
Copy link
Collaborator Author

bengl commented Aug 28, 2024

FWIW my benchmark was effectively:

const origRequire = Module.prototype.require

for (let i = 0; i < SOME_BIG_ENOUGH_NUMBER; i++) {
  require('dd-trace/packages/datadog-instrumentations')
  startTimer()
  require('undici')
  stopTime()
  clearCache()
  Module.prototype.require = origRequire
}

When I ran it with 0x, I saw that nearly all the time was being spent in setPrototypeOf.

I don't know the scenarios where the prototype would not be accessible by getOwnPropertyDescriptors.

We don't even need the prototype since we're copying all the ownKeys() and these are all functions. On the off chance that we're wrapping a function whose prototype is not Function.prototype, then we'd have a problem. If you want, I can add a check for that and special-case it. It should be pretty quick.

@rochdev
Copy link
Member

rochdev commented Aug 28, 2024

If you want, I can add a check for that and special-case it.

If you don't think it's a valid real-world case I don't really mind. I was more interested to know why rerunning shimmer many times would cause an exponential delay in the benchmark, whereas running the function directly on a new object every time shows no improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants