Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] [feature] Add batch writes mode for Sender #1797

Closed

Conversation

puzpuzpuz
Copy link

@puzpuzpuz puzpuzpuz commented Sep 10, 2020

Adds batchWrites option which enables wrap with socket.cork()/.uncork() calls behavior in Sender. When this option is in use, we get better throughput for relatively small messages (something around 1 KiB).

Note. This is an experiment and it's very unlikely to get into the library, so I didn't bother with proper documentation and tests. My intent is to demonstrate one of possible approaches (probably, the simplest one) to having batch writes in the library.

More context

On *nix OSes, on each socket.write() node tries to call libuv's uv_try_write function. For a ready-to-write TCP socket that function does the write immediately. So, for small messages this leads to certain overhead due to large amount of sys calls and other factors.

On the other hand, socket.cork()/.uncork() calls (and the underlying _writev() implementation) have a certain overhead, which may sometimes impact the latency for individual messages and, in general, makes almost no sense for larger messages. Having this option enabled also makes no sense in case of large/medium amount of open WS connections with infrequent communication happening over each socket.

Benchmark results

Existing benchmarks were considering latency of individual round-trips, so I've added a new one which measures throughput with different levels of concurrency.

Here is the result (10 runs, Node.js v14.10.0):

                                                                        confidence improvement accuracy (*)    (**)   (***)
 throughput.js concurrency=1 roundtrips=10000 size=128 KiB data=binary                -12.79 %      ±14.82% ±20.73% ±29.19%
 throughput.js concurrency=1 roundtrips=10000 size=128 KiB data=text                    3.54 %       ±4.62%  ±6.45%  ±9.08%
 throughput.js concurrency=1 roundtrips=100000 size=1 KiB data=binary          ***     -5.60 %       ±1.73%  ±2.37%  ±3.25%
 throughput.js concurrency=1 roundtrips=100000 size=1 KiB data=text            ***     -6.18 %       ±1.82%  ±2.54%  ±3.57%
 throughput.js concurrency=1 roundtrips=100000 size=64 B data=binary           ***     -6.15 %       ±1.21%  ±1.67%  ±2.31%
 throughput.js concurrency=1 roundtrips=100000 size=64 B data=text             ***     -6.28 %       ±1.79%  ±2.49%  ±3.45%
 throughput.js concurrency=32 roundtrips=10000 size=128 KiB data=binary               -10.20 %      ±11.71% ±16.06% ±21.92%
 throughput.js concurrency=32 roundtrips=10000 size=128 KiB data=text                   0.21 %       ±2.99%  ±4.11%  ±5.64%
 throughput.js concurrency=32 roundtrips=100000 size=1 KiB data=binary         ***     33.70 %       ±4.64%  ±6.42%  ±8.89%
 throughput.js concurrency=32 roundtrips=100000 size=1 KiB data=text           ***     14.04 %       ±5.03%  ±7.01%  ±9.81%
 throughput.js concurrency=32 roundtrips=100000 size=64 B data=binary          ***     36.32 %       ±3.95%  ±5.47%  ±7.56%
 throughput.js concurrency=32 roundtrips=100000 size=64 B data=text            ***     20.05 %       ±3.88%  ±5.36%  ±7.43%
 throughput.js concurrency=64 roundtrips=10000 size=128 KiB data=binary                -3.83 %       ±8.38% ±11.57% ±15.95%
 throughput.js concurrency=64 roundtrips=10000 size=128 KiB data=text                   0.53 %       ±3.13%  ±4.28%  ±5.84%
 throughput.js concurrency=64 roundtrips=100000 size=1 KiB data=binary         ***     34.95 %       ±5.03%  ±7.00%  ±9.79%
 throughput.js concurrency=64 roundtrips=100000 size=1 KiB data=text           ***     13.02 %       ±3.86%  ±5.30%  ±7.25%
 throughput.js concurrency=64 roundtrips=100000 size=64 B data=binary          ***     47.02 %       ±4.76%  ±6.57%  ±9.05%
 throughput.js concurrency=64 roundtrips=100000 size=64 B data=text            ***     20.34 %       ±3.85%  ±5.29%  ±7.24%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case there are 18 comparisons, you can thus
expect the following amount of false-positive results:
  0.90 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.18 false positives, when considering a   1% risk acceptance (**, ***),
  0.02 false positives, when considering a 0.1% risk acceptance (***)

Plot for 64 B messages
compare-ws-64-b-plot

Plot for 1 KiB messages
compare-ws-1-kb-plot

@lpinca
Copy link
Member

lpinca commented Sep 10, 2020

I don't understand the benchmark. Instead of calling ws.send() n times synchronously can't the user create a single message from the n chunks and send it?

I mean, if the goal is to optimize writes, why calling ws.send() multiple times on the same tick?

@puzpuzpuz
Copy link
Author

puzpuzpuz commented Sep 11, 2020

I don't understand the benchmark. Instead of calling ws.send() n times synchronously can't the user create a single message from the n chunks and send it?

I mean, if the goal is to optimize writes, why calling ws.send() multiple times on the same tick?

I can imaging writing multiple messages into a single socket from different handlers in an application code. In that case, application-level batching is inconvenient.

There might be other similar cases when multiple messages are written into the same socket in the same tick, like a message listener that might be delegating messages processing to per-message-type handlers each of which might be writing something back into the socket. Again, in this case it's inconvenient for the application developer to batch writes on their level and I doubt someone is actually doing that.

Yet, I doubt that it's a very popular scenario, as there should be intensive communication over each socket in order to gain any benefits from the batchWrites option. So, if you think that no one really needs it, I'm perfectly fine with closing this PR.

@lpinca
Copy link
Member

lpinca commented Sep 11, 2020

I can imaging writing multiple messages into a single socket from different handlers in an application code. In that case, application-level batching is inconvenient.

There might be other similar cases when multiple messages are written into the same socket in the same tick, like a message listener that might be delegating messages processing to per-message-type handlers each of which might be writing something back into the socket. Again, in this case it's inconvenient for the application developer to batch writes on their level and I doubt someone is actually doing that.

Yes, but I think it's very unlikely that those handlers will be called in the same tick.

This option might make sense if a single chunk in a 'data' event of a {net,tls}.Socket contains multiple uncompressed WebSocket frames, and most/all of them trigger a write on the {net,tls}.Socket. I guess this depends on the network and how data is actually sent but I don't think it's common in the real world so I would prefer to not introduce additional complexity.

Please correct me if I'm wrong.

@puzpuzpuz
Copy link
Author

I guess this depends on the network and how data is actually sent but I don't think it's common the real world so I would prefer to not introduce additional complexity.

Yes, that's correct: it depends on the sender side and the network. Some applications, where WebSocket is used for communication between services, may fall into this category, but they are not that common.

In any case, let me close the PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants