-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare and write txn rows in parallel #802
Merged
Merged
Changes from 2 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
package writer | ||
|
||
import ( | ||
"github.com/jackc/pgx/v4" | ||
) | ||
|
||
// Implements pgx.CopyFromSource. | ||
type copyFromChannelStruct struct { | ||
ch chan []interface{} | ||
next []interface{} | ||
err error | ||
} | ||
|
||
func (c *copyFromChannelStruct) Next() bool { | ||
var ok bool | ||
c.next, ok = <-c.ch | ||
return ok | ||
} | ||
|
||
func (c *copyFromChannelStruct) Values() ([]interface{}, error) { | ||
return c.next, nil | ||
} | ||
|
||
func (c *copyFromChannelStruct) Err() error { | ||
return nil | ||
} | ||
|
||
func copyFromChannel(ch chan []interface{}) pgx.CopyFromSource { | ||
return ©FromChannelStruct{ch: ch} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By definition, we're doing this because serialization is slower than the network. Therefore we shouldn't need a buffered channel.
If we do keep the buffered channel, you'll need to make sure we remove any unread items from the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, serialization is faster than postgres.
Unread items in case of an error? The channel will be garbage collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since serialization is faster, doesn't that mean a buffer is not required? As long as we're able to have 1 pending object to load there would never be any downtime.
Re: garbage collection. I didn't realize the unread items would be garbage collected, but it makes sense that this would be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there is a more authoritative source for this, but here is a reference to something I've heard before:
https://github.com/uber-go/guide/blob/master/style.md#channel-size-is-one-or-none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one is much faster than the other, this PR wouldn't improve anything. However, here postgres and serialization are somewhat comparable in speed.
Our postgres driver creates buffers of 65KB and writes them to the network. Depending on whether the "network" has a large enough buffer, this call may block and we would waste CPU time that we could use for serialization. So actually, I'm thinking we should increase the buffer to 1024 to be safe. Would do you think?
I've seen this uber style guide. I think they mainly worry about blocking that can occur when writing to a channel, but when the buffer size is large, blocking only rarely occurs in production and not in testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the 65KB number you mentioned, but it doesn't seem to be working the way you described. It doesn't accumulate 65KB of data, it reads up to 65KB of data then sends it immediately. Since it's TCP it needs to be received correctly (the wasted CPU time you mentioned), but there doesn't seem to be a way to speed that up, so I don't think we need a large buffer of transactions.
https://github.com/jackc/pgconn/blob/master/pgconn.go#L1199
It seems like having a large buffered channel wouldn't help anything. Did you see that performance was affected in your testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree that we would waste cpu time with an unbuffered channel, then as soon as
w.Write()
returns, wherew
is the write side of the pipe, we will need to spend time serializing transactions until we can callw.Write()
again. I tried tracing down the code, and I think it boils down to how fast the write syscall returns. That seems to be platform depended, so I would rather introduce a big enough buffer in our code. In any case, it's small enough not to worry about it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Waste" is the wrong term since the goal isn't to use as much CPU as possible. From what I see we need one pending round to continuously load the DB. Thanks to the pipe we already have 2 queued up with an unbuffered channel.
How did you get the 8% metric? Could you try with an unbuffered channel to see if it has the same performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that pipe is that the writer writes slightly more than 2^16 bytes and will unblock only when all of it is read. However, the reader reads slightly less than 2^16 bytes and writes it to the network before reading the other couple of bytes.
I'm running tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chan size | TPS
0 | 6798.43
64 | 6891.80
1024 | 6946.75