Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make derivative 5-8 times faster #3322

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

paulftw
Copy link

@paulftw paulftw commented Nov 19, 2024

derivative seems to spend most of its time inside _toString, because constNodes[foo] converts keys to strings.
I'm not sure Set is really needed as a separate type, but because _derivative is a typed function I had to introduce it.

Benchmark before the change (Core i5 2017 macbook):

ddf x 23.77 ops/sec ±0.54% (43 runs sampled)
df x 79.58 ops/sec ±0.57% (69 runs sampled)
ddf x 23.62 ops/sec ±0.54% (43 runs sampled)
df x 78.86 ops/sec ±0.50% (74 runs sampled)
ddf x 23.73 ops/sec ±0.47% (43 runs sampled)
df x 79.73 ops/sec ±0.57% (69 runs sampled)
ddf x 23.85 ops/sec ±0.64% (44 runs sampled)
df x 80.17 ops/sec ±0.43% (69 runs sampled)

Benchmark with this PR (same old macbook):

ddf x 1,781 ops/sec ±14.63% (80 runs sampled)
df x 5,161 ops/sec ±30.53% (62 runs sampled)
ddf x 1,769 ops/sec ±17.68% (79 runs sampled)
df x 5,286 ops/sec ±30.00% (72 runs sampled)
ddf x 1,910 ops/sec ±4.58% (85 runs sampled)
df x 4,275 ops/sec ±41.16% (58 runs sampled)
ddf x 1,959 ops/sec ±4.91% (88 runs sampled)
df x 4,474 ops/sec ±37.39% (62 runs sampled)

@josdejong
Copy link
Owner

Wow, this is a huge performance improvement! That is a smart idea, thanks Paul.

Your PR looks good to go. I have only one thought: the old solution with Object would deduplicate nodes that have the same string representation, and the new solution with Set doesn't deduplicate since it keeps nodes by their reference. Can that somehow lead to issues or a differing behavior? (I can't come up with anything myself, just checking)

@paulftw
Copy link
Author

paulftw commented Nov 21, 2024

@josdejong Good question! When _derivative checks a node against constNodes it relies on the fact that someone (usually plainDerivative) has checked that node and if necessary marked it const. With Object adding a node to constNodes also "adds" all future and current instances with same string representation. With Set only the original node is added.

Things could behave differently if _derivative

  1. constructs a constant node and
  2. recursively passes it back to itself without calling constTag first.

Old code would depend on whether constructed node appears anywhere in the input as a sub-expression. New code will always take derivative of it.
I'm not sure if it's bad or even possible. Assumed that tests would pick it up if it really matters, but maybe the test suite isn't that robust. WDYT?

@paulftw
Copy link
Author

paulftw commented Nov 21, 2024

Bulletproof way would be to add memoization to constTag and never read constNodes directly i.e. make it an internal cache.

@paulftw
Copy link
Author

paulftw commented Nov 21, 2024

@josdejong PTAL
Idea of a potential bug didn't sit too well with me so I've refactored constTag to be a pure (and cached) function. This allowed me to undo the Set & isSet boilerplate.
Surprisingly enough it changed the test output for nthRoot((6x), (2x)). Same expression, but an extra zero is now discarded. Seems like a good thing but it'd be nice for someone knowledgeable to double check.

Will post a new benchmark shortly, can't promise it will be as good as the previous one though.

@paulftw
Copy link
Author

paulftw commented Nov 21, 2024

I've updated the benchmark and it made measurement error smaller (or whatever that ±30.53% meant). I think it's because GC fires less often / finishes faster.

Adding isConstCached made everything somewhat slower (on one specific randomly chosen expression). Still better than the original. Code is sort of more readable and reliable (to me personally, in large part because I wrote it).

Let me know if you have any suggestions how to improve this PR.

Numbers for the new benchmark:

Set()

ddf x 2,083 ops/sec ±0.84% (189 runs sampled)
df  x 6,720 ops/sec ±0.52% (191 runs sampled)
ddf x 2,042 ops/sec ±1.73% (188 runs sampled)
df  x 6,684 ops/sec ±0.48% (191 runs sampled)


isConstCached

ddf x 1,709 ops/sec ±1.19% (188 runs sampled)
df  x 5,589 ops/sec ±0.48% (189 runs sampled)
ddf x 1,668 ops/sec ±0.68% (186 runs sampled)
df  x 5,405 ops/sec ±0.53% (189 runs sampled)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants