Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare and write txn rows in parallel #802

Merged
merged 3 commits into from
Dec 3, 2021

Conversation

tolikzinovyev
Copy link
Contributor

Summary

At the moment, we first prepare all database rows for the txn table, and only then write them to the database. Preparing database rows is expensive, mainly because of json encoding. This PR makes preparation and writing to the database concurrent, thus improving performance. Sample tests showed 8% in TPS improvement.

Closes #794.

@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2021

Codecov Report

Merging #802 (2486257) into develop (f372206) will increase coverage by 0.15%.
The diff coverage is 77.27%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #802      +/-   ##
===========================================
+ Coverage    58.83%   58.98%   +0.15%     
===========================================
  Files           29       30       +1     
  Lines         4059     4084      +25     
===========================================
+ Hits          2388     2409      +21     
- Misses        1374     1378       +4     
  Partials       297      297              
Impacted Files Coverage Δ
idb/postgres/internal/writer/writer.go 78.37% <73.68%> (+0.04%) ⬆️
idb/postgres/internal/writer/util.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f372206...2486257. Read the comment docs.

ctx, cancelFunc := context.WithCancel(context.Background())
defer cancelFunc()

ch := make(chan []interface{}, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition, we're doing this because serialization is slower than the network. Therefore we shouldn't need a buffered channel.

If we do keep the buffered channel, you'll need to make sure we remove any unread items from the channel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, serialization is faster than postgres.

Unread items in case of an error? The channel will be garbage collected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since serialization is faster, doesn't that mean a buffer is not required? As long as we're able to have 1 pending object to load there would never be any downtime.

Re: garbage collection. I didn't realize the unread items would be garbage collected, but it makes sense that this would be the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a more authoritative source for this, but here is a reference to something I've heard before:
https://github.com/uber-go/guide/blob/master/style.md#channel-size-is-one-or-none

Copy link
Contributor Author

@tolikzinovyev tolikzinovyev Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one is much faster than the other, this PR wouldn't improve anything. However, here postgres and serialization are somewhat comparable in speed.

Our postgres driver creates buffers of 65KB and writes them to the network. Depending on whether the "network" has a large enough buffer, this call may block and we would waste CPU time that we could use for serialization. So actually, I'm thinking we should increase the buffer to 1024 to be safe. Would do you think?

I've seen this uber style guide. I think they mainly worry about blocking that can occur when writing to a channel, but when the buffer size is large, blocking only rarely occurs in production and not in testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the 65KB number you mentioned, but it doesn't seem to be working the way you described. It doesn't accumulate 65KB of data, it reads up to 65KB of data then sends it immediately. Since it's TCP it needs to be received correctly (the wasted CPU time you mentioned), but there doesn't seem to be a way to speed that up, so I don't think we need a large buffer of transactions.

https://github.com/jackc/pgconn/blob/master/pgconn.go#L1199

	go func() {
		buf := make([]byte, 0, 65536)
		buf = append(buf, 'd')
		sp := len(buf)

		for {

// here, r is a named pipe. When you call CopyTo data is written into the pipe.
// 65KB isn't a target, it's a temporary buffer. There is no guarantee that it should be filled.
// To be more precise, from the  documentation:
//    Reads and Writes on the pipe are matched one to one except when multiple Reads are
//    needed to consume a single Write. That is, each Write to the PipeWriter blocks until it 
//    has satisfied one or more Reads from the PipeReader that fully consume the written data.

			n, readErr := r.Read(buf[5:cap(buf)])
			if n > 0 {
				buf = buf[0 : n+5]
				pgio.SetInt32(buf[sp:], int32(n+4))

// If any data was read... 1 byte, all 65KB bytes, it's written to the network.
// This means that the writer is blocked here regardless of how large the buffered
// channel is. After the read, our code will queue the next row in the pipe, and
// queue the one after in the channel.

				_, writeErr := pgConn.conn.Write(buf)
				if writeErr != nil {
					// Write errors are always fatal, but we can't use asyncClose because we are in a different goroutine.
					pgConn.conn.Close()

					copyErrChan <- writeErr
					return
				}
			}
			if readErr != nil {
				copyErrChan <- readErr
				return
			}

			select {
			case <-abortCopyChan:
				return
			default:
			}
		}
	}()

It seems like having a large buffered channel wouldn't help anything. Did you see that performance was affected in your testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree that we would waste cpu time with an unbuffered channel, then as soon as w.Write() returns, where w is the write side of the pipe, we will need to spend time serializing transactions until we can call w.Write() again. I tried tracing down the code, and I think it boils down to how fast the write syscall returns. That seems to be platform depended, so I would rather introduce a big enough buffer in our code. In any case, it's small enough not to worry about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Waste" is the wrong term since the goal isn't to use as much CPU as possible. From what I see we need one pending round to continuously load the DB. Thanks to the pipe we already have 2 queued up with an unbuffered channel.

How did you get the 8% metric? Could you try with an unbuffered channel to see if it has the same performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that pipe is that the writer writes slightly more than 2^16 bytes and will unblock only when all of it is read. However, the reader reads slightly less than 2^16 bytes and writes it to the network before reading the other couple of bytes.

I'm running tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chan size | TPS
0 | 6798.43
64 | 6891.80
1024 | 6946.75

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Based on your comment I think we just need to decide whether the channel should be unbuffered, buffered with a size of one, or convince ourselves that a buffer > 1 is helpful.

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running the tests, there's clearly a marginal improvement after all!

@tolikzinovyev tolikzinovyev merged commit f4ff381 into develop Dec 3, 2021
@tolikzinovyev tolikzinovyev deleted the tolik/add-txns-pipeline branch December 3, 2021 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prepare and write txn rows in parallel
3 participants