[Experiment] [feature] Add batch writes mode for Sender #1797

puzpuzpuz · 2020-09-10T14:03:41Z

Adds batchWrites option which enables wrap with socket.cork()/.uncork() calls behavior in Sender. When this option is in use, we get better throughput for relatively small messages (something around 1 KiB).

Note. This is an experiment and it's very unlikely to get into the library, so I didn't bother with proper documentation and tests. My intent is to demonstrate one of possible approaches (probably, the simplest one) to having batch writes in the library.

More context

On *nix OSes, on each socket.write() node tries to call libuv's uv_try_write function. For a ready-to-write TCP socket that function does the write immediately. So, for small messages this leads to certain overhead due to large amount of sys calls and other factors.

On the other hand, socket.cork()/.uncork() calls (and the underlying _writev() implementation) have a certain overhead, which may sometimes impact the latency for individual messages and, in general, makes almost no sense for larger messages. Having this option enabled also makes no sense in case of large/medium amount of open WS connections with infrequent communication happening over each socket.

Benchmark results

Existing benchmarks were considering latency of individual round-trips, so I've added a new one which measures throughput with different levels of concurrency.

Here is the result (10 runs, Node.js v14.10.0):

                                                                        confidence improvement accuracy (*)    (**)   (***)
 throughput.js concurrency=1 roundtrips=10000 size=128 KiB data=binary                -12.79 %      ±14.82% ±20.73% ±29.19%
 throughput.js concurrency=1 roundtrips=10000 size=128 KiB data=text                    3.54 %       ±4.62%  ±6.45%  ±9.08%
 throughput.js concurrency=1 roundtrips=100000 size=1 KiB data=binary          ***     -5.60 %       ±1.73%  ±2.37%  ±3.25%
 throughput.js concurrency=1 roundtrips=100000 size=1 KiB data=text            ***     -6.18 %       ±1.82%  ±2.54%  ±3.57%
 throughput.js concurrency=1 roundtrips=100000 size=64 B data=binary           ***     -6.15 %       ±1.21%  ±1.67%  ±2.31%
 throughput.js concurrency=1 roundtrips=100000 size=64 B data=text             ***     -6.28 %       ±1.79%  ±2.49%  ±3.45%
 throughput.js concurrency=32 roundtrips=10000 size=128 KiB data=binary               -10.20 %      ±11.71% ±16.06% ±21.92%
 throughput.js concurrency=32 roundtrips=10000 size=128 KiB data=text                   0.21 %       ±2.99%  ±4.11%  ±5.64%
 throughput.js concurrency=32 roundtrips=100000 size=1 KiB data=binary         ***     33.70 %       ±4.64%  ±6.42%  ±8.89%
 throughput.js concurrency=32 roundtrips=100000 size=1 KiB data=text           ***     14.04 %       ±5.03%  ±7.01%  ±9.81%
 throughput.js concurrency=32 roundtrips=100000 size=64 B data=binary          ***     36.32 %       ±3.95%  ±5.47%  ±7.56%
 throughput.js concurrency=32 roundtrips=100000 size=64 B data=text            ***     20.05 %       ±3.88%  ±5.36%  ±7.43%
 throughput.js concurrency=64 roundtrips=10000 size=128 KiB data=binary                -3.83 %       ±8.38% ±11.57% ±15.95%
 throughput.js concurrency=64 roundtrips=10000 size=128 KiB data=text                   0.53 %       ±3.13%  ±4.28%  ±5.84%
 throughput.js concurrency=64 roundtrips=100000 size=1 KiB data=binary         ***     34.95 %       ±5.03%  ±7.00%  ±9.79%
 throughput.js concurrency=64 roundtrips=100000 size=1 KiB data=text           ***     13.02 %       ±3.86%  ±5.30%  ±7.25%
 throughput.js concurrency=64 roundtrips=100000 size=64 B data=binary          ***     47.02 %       ±4.76%  ±6.57%  ±9.05%
 throughput.js concurrency=64 roundtrips=100000 size=64 B data=text            ***     20.34 %       ±3.85%  ±5.29%  ±7.24%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case there are 18 comparisons, you can thus
expect the following amount of false-positive results:
  0.90 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.18 false positives, when considering a   1% risk acceptance (**, ***),
  0.02 false positives, when considering a 0.1% risk acceptance (***)

Plot for 64 B messages

Plot for 1 KiB messages

lpinca · 2020-09-10T19:06:07Z

I don't understand the benchmark. Instead of calling ws.send() n times synchronously can't the user create a single message from the n chunks and send it?

I mean, if the goal is to optimize writes, why calling ws.send() multiple times on the same tick?

puzpuzpuz · 2020-09-11T07:59:50Z

I don't understand the benchmark. Instead of calling ws.send() n times synchronously can't the user create a single message from the n chunks and send it?

I mean, if the goal is to optimize writes, why calling ws.send() multiple times on the same tick?

I can imaging writing multiple messages into a single socket from different handlers in an application code. In that case, application-level batching is inconvenient.

There might be other similar cases when multiple messages are written into the same socket in the same tick, like a message listener that might be delegating messages processing to per-message-type handlers each of which might be writing something back into the socket. Again, in this case it's inconvenient for the application developer to batch writes on their level and I doubt someone is actually doing that.

Yet, I doubt that it's a very popular scenario, as there should be intensive communication over each socket in order to gain any benefits from the batchWrites option. So, if you think that no one really needs it, I'm perfectly fine with closing this PR.

lpinca · 2020-09-11T09:15:33Z

I can imaging writing multiple messages into a single socket from different handlers in an application code. In that case, application-level batching is inconvenient.

There might be other similar cases when multiple messages are written into the same socket in the same tick, like a message listener that might be delegating messages processing to per-message-type handlers each of which might be writing something back into the socket. Again, in this case it's inconvenient for the application developer to batch writes on their level and I doubt someone is actually doing that.

Yes, but I think it's very unlikely that those handlers will be called in the same tick.

This option might make sense if a single chunk in a 'data' event of a {net,tls}.Socket contains multiple uncompressed WebSocket frames, and most/all of them trigger a write on the {net,tls}.Socket. I guess this depends on the network and how data is actually sent but I don't think it's common in the real world so I would prefer to not introduce additional complexity.

Please correct me if I'm wrong.

puzpuzpuz · 2020-09-11T09:29:33Z

I guess this depends on the network and how data is actually sent but I don't think it's common the real world so I would prefer to not introduce additional complexity.

Yes, that's correct: it depends on the sender side and the network. Some applications, where WebSocket is used for communication between services, may fall into this category, but they are not that common.

In any case, let me close the PR then.

[feature] Add batch writes mode for Sender

d4516a8

puzpuzpuz force-pushed the experiment/batch-write-mode branch from c9cd59d to d4516a8 Compare September 10, 2020 14:19

puzpuzpuz closed this Sep 11, 2020

lpinca mentioned this pull request Apr 1, 2024

perf: cork socket for a micro task #2214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] [feature] Add batch writes mode for Sender #1797

[Experiment] [feature] Add batch writes mode for Sender #1797

puzpuzpuz commented Sep 10, 2020 •

edited

Loading

lpinca commented Sep 10, 2020 •

edited

Loading

puzpuzpuz commented Sep 11, 2020 •

edited

Loading

lpinca commented Sep 11, 2020 •

edited

Loading

puzpuzpuz commented Sep 11, 2020

[Experiment] [feature] Add batch writes mode for Sender #1797

[Experiment] [feature] Add batch writes mode for Sender #1797

Conversation

puzpuzpuz commented Sep 10, 2020 • edited Loading

More context

Benchmark results

lpinca commented Sep 10, 2020 • edited Loading

puzpuzpuz commented Sep 11, 2020 • edited Loading

lpinca commented Sep 11, 2020 • edited Loading

puzpuzpuz commented Sep 11, 2020

puzpuzpuz commented Sep 10, 2020 •

edited

Loading

lpinca commented Sep 10, 2020 •

edited

Loading

puzpuzpuz commented Sep 11, 2020 •

edited

Loading

lpinca commented Sep 11, 2020 •

edited

Loading