Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use fixed queue #555

Merged
merged 49 commits into from
May 19, 2024
Merged

feat: Use fixed queue #555

merged 49 commits into from
May 19, 2024

Conversation

JaoodxD
Copy link
Contributor

@JaoodxD JaoodxD commented May 4, 2024

Some time ago @ronag mentioned in #452 that Piscina could adopt Nodejs fixed_queue as TaskQueue.

This draft will fix #452.

Synthetic benchmark shows drastical speed difference when comparing fixed_queue vs queue based on plain js array.
But actual simple-benchmark.js shows the oposite result.
I passed FixedQueue as custom queue here.
Inital simple-benchmark.js shows around ~82k ops on my machine, but simple-benchmark-2.js with FixedQueue shows half of original performance with ~44k ops.

Do I miss something?

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 4, 2024

@ronag @metcoder95 could you take a look?

Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can potentitally due to the optimizations done for Array operations. Haven't check but often they are already inlined within v8 (sometimes even in assembly already).

Can you try to use a tool like tinybench to benchmark them?

src/fixed-queue.ts Outdated Show resolved Hide resolved
Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's export the new FixedQueue as a built-in task queue that can be used instead of the current one, so users have the chance to use it.

Let's add some documentation for it as well, and it should be good to go

@ronag
Copy link
Collaborator

ronag commented May 6, 2024

We could also make a "smart" mode. Where if more than x (128?) number of items have been queued we switch automatically to fixed queue.

@metcoder95 metcoder95 changed the title Use fixed queue feat: Use fixed queue May 6, 2024
@metcoder95
Copy link
Member

Nice, that would be great! We can add an option called smart being boolean, false (which is also the default) keeps using the same Task queue and true, makes the switch to FixedQueue after a threshold (128 seems like a good default to start with).

Tho if the default task queue is already a FixedQueue, there's nothing to do there.

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 6, 2024

Although the bottleneck with size getter is already fixed, I still can't find the main one.
Tinybench benchmarks shows strange behavior.
When I compare just two queues, FixedQueue behaves faster, in order of magnitudes faster and the order rises with amount of elements in it.
But when I do the same benchmarks against Piscina instances, it shows identical perf, like queue doesn't matter at all.

@ronag
Copy link
Collaborator

ronag commented May 6, 2024

TBH. I think we should always use FixedQueue by default, the "smart" mode is just a compromise suggestion

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 6, 2024

Although the bottleneck with size getter is already fixed, I still can't find the main one. Tinybench benchmarks shows strange behavior. When I compare just two queues, FixedQueue behaves faster, in order of magnitudes faster and the order rises with amount of elements in it. But when I do the same benchmarks against Piscina instances, it shows identical perf, like queue doesn't matter at all.

Here's a benchmark of 1000 items pushed to queue and then shifted:

┌─────────┬─────────────────────────────────────────┬───────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                               │ ops/sec   │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼─────────────────────────────────────────┼───────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'ArrayTaskQueue full push + full shift' │ '9 692'   │ 103175.15463917515 │ '±0.80%' │ 970     │
│ 1       │ 'FixedQueue  full push + full shift'    │ '131 879' │ 7582.696390658352  │ '±1.81%' │ 13188   │
└─────────┴─────────────────────────────────────────┴───────────┴────────────────────┴──────────┴─────────┘

We can see that FixedQueue is ~13 times faster.

But when we do the same benchmark against Piscina with two different taskQueue it shows following:

┌─────────┬─────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                               │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼─────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'ArrayTaskQueue full push + full shift' │ '7'     │ 132713349.99999994 │ '±2.11%' │ 10      │
│ 1       │ 'FixedQueue  full push + full shift'    │ '7'     │ 129970319.99999999 │ '±1.93%' │ 10      │
└─────────┴─────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

Which is very strange.

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 6, 2024

Should I push this benchmarks into the branch?

@ronag
Copy link
Collaborator

ronag commented May 6, 2024

I think the difference is within the margin of error.

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 6, 2024

I think the difference is within the margin of error.

And I completely don't understand why, I thought that even at this scale the difference should be measurable.

I just raised the amount of tasks to be queued to 100_000 and Piscina benchmark started to show the difference.

┌─────────┬─────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                               │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼─────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'ArrayTaskQueue full push + full shift' │ '0'     │ 8022717600.000002  │ '±4.37%' │ 10      │
│ 1       │ 'FixedQueue  full push + full shift'    │ '0'     │ 1304727259.9999983 │ '±4.89%' │ 10      │
└─────────┴─────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

FixedQueue is ~6 times faster now.

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 6, 2024

Seems like I have found another caveat. Typescript target ES2019 compiles private fields into something that kills performance.

TLDR:
ES2022 private fields x6 times faster, than ES2019 (Current project target).
Old-style private fields with _ is x7 faster on the same ES2019 target, and ~20% faster than private fields on ES2022 target.

Here is benchmark comparison of current ArrayTaskQueue vs FixedQueue with target: ES2019

┌─────────┬─────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                               │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼─────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'ArrayTaskQueue full push + full shift' │ '0'     │ 1143968079.9999998 │ '±0.48%' │ 10      │
│ 1       │ 'FixedQueue full push + full shift'     │ '146'   │ 6835453.33333362   │ '±3.78%' │ 15      │
└─────────┴─────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

And here's the same benchmark with target ES2022:

┌─────────┬─────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                               │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼─────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'ArrayTaskQueue full push + full shift' │ '0'     │ 1139727079.9999993 │ '±0.42%' │ 10      │
│ 1       │ 'FixedQueue full push + full shift'     │ '849'   │ 1176970.930232507  │ '±2.56%' │ 86      │
└─────────┴─────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

And that's what we can get with target ES2019 but rewriting modern private field syntax into old one with underscore (like _field):

┌─────────┬─────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                               │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼─────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'ArrayTaskQueue full push + full shift' │ '0'     │ 1162376920.0000002 │ '±1.77%' │ 10      │
│ 1       │ 'FixedQueue full push + full shift'     │ '1 026' │ 974328.1553396407  │ '±2.51%' │ 103     │
└─────────┴─────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

As we can see the same FixedQueue implementation shows different results.

  • 146 ops/sec with "target" : "ES2019" and modern private fields syntax.
  • 849 ops/sec with "target" : "ES2022" and modern private fields syntax.
  • 1026 ops/sec with "target" : "ES2019" and old-style agreement of using _ in the name of "private" properties.

And the question is should we update target or rewrite all private fields on the hotpath?

@metcoder95
Copy link
Member

metcoder95 commented May 7, 2024

TBH. I think we should always use FixedQueue by default, the "smart" mode is just a compromise suggestion

I'm afraid of breaking implementation if there is custom usage of the current ArrayQueue, I'm ok with keeping it like this within the current version, and make it the default on major.

I'm planning to expedite a major mid June or early July

FixedQueue is ~6 times faster now.

It's kind of expected.
FixedQueue behaves better over a big load, compared to the ArrayQueue, that's why the suggestion of the mode is based on a threshold.

And the question is should we update target or rewrite all private fields on the hotpath?

Let's rewrite it for now, and we can aim to target 2022 private syntax on next major

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 7, 2024

So I propose to do following:

  1. Expose to import/require FixedQueue from the module.
  2. Mention it in docs so the users are able to pass it as custom taskQueue as demonstrated below:
import { Piscina, FixedQueue } from 'Piscina'

const pool = new Piscina({
  filename: resolve(__dirname, 'fixtures/add.js'),
  taskQueue: new FixedQueue()
})

and make it the default on major.

Until it eventualy becomes default in the following major release.

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 7, 2024

Also I have a few little benchmarks (results of which I posted above) and tinybench as new dev dependency.
Should I commit them or it is not necessary?

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 7, 2024

Nice, that would be great! We can add an option called smart being boolean, false (which is also the default) keeps using the same Task queue and true, makes the switch to FixedQueue after a threshold (128 seems like a good default to start with).

Tho if the default task queue is already a FixedQueue, there's nothing to do there.

I also think that FixedQueue would be better in all cases.
The only benefit current ArrayQueue has is memory footprint, since FixedQueue preallocate 2048 elements array, while ArrayQueue allocate only 4 on first insertion.

@metcoder95
Copy link
Member

So I propose to do following:

SGTM 👍

Should I commit them or it is not necessary?

Let's commit them, will serve as a baseline for further optimizations.

I also think that FixedQueue would be better in all cases.
The only benefit current ArrayQueue has is memory footprint, since FixedQueue preallocate 2048 elements array, while ArrayQueue allocate only 4 on first insertion.

Agree, let's mention this while documenting the new FixedQueue 👍

@metcoder95
Copy link
Member

For the suggestion of the FixedQueue issue, I'd advice you to open an issue in Node for that; its possible it is by design, otherwise a issue to fix 👍

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 14, 2024

For the suggestion of the FixedQueue issue, I'd advice you to open an issue in Node for that; its possible it is by design, otherwise a issue to fix 👍

I sat and thought for a while and came to the same conclusion. Maybe this is done in order not to check the real number of elements.
But it will be interesting to know for sure.

@JaoodxD JaoodxD marked this pull request as ready for review May 14, 2024 20:24
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests using the task queue with Piscina?

Copy link
Contributor Author

@JaoodxD JaoodxD May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few. Should I test any particular cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with basics and possible cases that you'd like to test when you implemented the new task queue

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 14, 2024

For the suggestion of the FixedQueue issue, I'd advice you to open an issue in Node for that; its possible it is by design, otherwise a issue to fix 👍

Seems like it wasn't a bug and I should've carefully read the comments which clearly states the following:

// When `top === bottom` the current queue is empty and when
// `top + 1 === bottom` it's full. This wastes a single space of storage
// but allows much quicker checks.

https://github.com/nodejs/node/blob/9807ede6fb17afe36a2447df65eb6b63df8d1d37/lib/internal/fixed_queue.js#L55


Will double check how it affects performance and then revert my "fixes" if the difference is observable.

@metcoder95
Copy link
Member

Will double check how it affects performance and then revert my "fixes" if the difference is observable.

Give it a try with max size + 1, and (max * 2) + 1 as baseline just to see its impact when expanding the queue

@metcoder95 metcoder95 merged commit 8afa70f into piscinajs:current May 19, 2024
10 checks passed
@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 19, 2024

@metcoder95 sorry. Been busy with work, so haven't added any additional benchmarks.
Will make a separate PR with them a bit later.

@metcoder95
Copy link
Member

np! The PR was great to land already, as it is not the default the benchmarks are great in separated PR 👍

@JaoodxD
Copy link
Contributor Author

JaoodxD commented May 22, 2024

@metcoder95 Not quietly sure where and in which shape it should be present in piscina repo (if should at all?).
Here's a result of benchmark from following repo.
https://github.com/JaoodxD/fixed-queue-perf-test


It shows results for queue size of

  • max_size + 1 (Current Piscina FixedQueue is 15% faster)
  • max_size * 2 + 1 (Node is faster for 20%)
  • max_size * 64 + 1 (Node is faster for 13%)
  • max_size * 1024 + 1 (Node is faster for 18%)

As we might notice the first suite shows that Piscina fix actualy has positive effect, but it's an edge case and in general Node's implementation is roughly 13-20% faster.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default TaskQueue implementation is slow for a large number of queued tasks
3 participants