Postgres: avoid tiny writes to improve performance in read-heavy scenarios#29812
Postgres: avoid tiny writes to improve performance in read-heavy scenarios#29812
Conversation
|
@smallinsky FYI. |
|
@smallinsky e352893 implements message parsing loop running in a separate goroutine, enabling analysis. No performance impact observed. |
To count the messages, consume copied message stream in a separate goroutine.
e352893 to
bf8ca89
Compare
|
@Tener |
Yep, just working on it right now. Just made a repo with my scripts: https://github.com/Tener/teleport-bench-dbs, I'll reference it in my findings. |
rosstimothy
left a comment
There was a problem hiding this comment.
Can we write a benchmark test that exercises this before and after the change so we have some tangible numbers on just how much better this solution is?
@rosstimothy Sure, I just added the results from the benchmark I've been using throughout this work. |
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
That's not quite what I was talking about. The numbers look good but it'd be nice if we had a |
That would be nice indeed, and we do have relevant RFD open: It isn't implemented, however, which is why I went with custom scripts. |
I don't think that RFD applies. That is implementing tests to exercise db access via |
|
I can write a limited benchmark, for example exercising only I believe that the tests proposed in RFD#141 will ultimately be much more useful, being end-to-end and representing actual real-world workloads. |
|
@rosstimothy I've added the benchmark to the PR. The results are in the PR description. |
rosstimothy
left a comment
There was a problem hiding this comment.
Thanks for the benchmark test! LGTM other than a few nits.
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
This change is analogous to the Postgres change made in #29812.
This change is analogous to the Postgres change made in #29812.
This change is analogous to the Postgres change made in #29812.
This change is analogous to the Postgres change made in #29812.
Currently, for Postgres, we read individual server messages one by one before forwarding them to the client, also one by one. This limits the performance if the messages are small, as sending them in this manner is costly: each takes at least one syscall to perform. In particular, this can be observed in read-heavy scenarios, such as
pg_dumporSELECT * FROM large_table.This PR restructures the handling of the server byte stream. The main driver is
io.Copycall, which ensures performance due to the large buffers involved. For analysis purposes, we copy the byte stream for consumption in a separate goroutine, which counts the parsed messages. The byte stream is copied synchronously courtesy ofio.Pipe, which ensures the stream does not consume significant memory resources.I have tested the performance of this part of the code using home-grown scripts available here.
I ran the tests on my local MacBook, with native Postgres (
postgres (PostgreSQL) 15.3 (Homebrew)).make read-benchmarkmaster @ 9960697d35make read-benchmarktener/postgres-io-copy @ bf8ca89573make read-benchmarkmasteris CPU-bound and limited by single-core performance on the Teleport side and is ~13.7 times slower compared to native performance (without Teleport). This PR closes the gap by a large margin, leaving only 15% overhead.Results from the synthetic benchmark also added in PR:
Contrast with
master:For this benchmark, the new code achieves a speedup of at least 1.38x, or as much as 192x - depending on the message size. Running the benchmark with
-benchmem(results not shown for brevity) indicates the updated code allocates less memory in fewer individual allocations, ultimately leading to decreased GC pressure.Changelog: Postgres: improve performance in read-heavy scenarios.
Fixes #26868.