-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace conditional variable with semaphore #24
Conversation
The semaphore strategy is much cleaner. I wish that semaphore package had existed when I wrote the original pool in pgx that was extracted to puddle. Here are the things I noticed. This requires Go 1.19. pgx (the primary consumer of puddle) supports at least the previous 2 Go releases. So this needs to be compatible with Go 1.18. That means we don't get the nicer atomic types. We'd need to use the older atomics and ensure that they work on 32-bit platforms. (It surprised me that people are still using 32-bit, but I've gotten several issues files due to atomics on 32-bit). The test coverage is now only 98.4%. I'd like to get back to 100% test coverage. I'd guess both of those are relatively easy to fix. The change to a circular queue from a stack changes behavior that will affect users of the pool that implement inactivity checks. For example, consider the case of a But my primary concern is how big a change this is. This is a rewrite of almost the entire core and there's a lot of tricky logic involved. Aside from the concerns noted above it looks good to me, but I'd like to have input from more reviewers. I'm going to create an issue on pgx to request additional input. |
Replies to code review notes
Just food for thought: Because
No problem. I used One nice thing about those atomic structs is that they address the alignment issue you had with 64-bit types on 32-bit platforms. https://pkg.go.dev/sync/atomic#pkg-note-BUG
In other words, struct first element is always 64 bit aligned.
When you take a look which rows are not covered, those are unreachable rows with 2 panics. I added comments to make it clear that this code is unreachable. Those 2 rows check that the state of the semaphore is in-sync with the state of the pool. To be honest, those rows simplified debugging of many bugs. For this reason I decided to keep them in my PR. On the other hand if you don't like them I can drop them from the code.
OK, good point. I have reworked the setup not to use circular queue, but generational stack (see modified implementation notes below). A single stack implementation was not race-free. It would be useful to document this feature in doc-comments of
I agree, to be honest I have been quite worried how will you accept such a big change. On the other hand I was not able to find any way how to split this change into more small pull-requests. If you need someone for CR, I can ask somebody from our company to take a look. Yet I have to admit that I'd prefer review by someone who has deep knowledge of the Implementation notes (changes)Those changes were also propagated to the main PR description not to confuse future readers along the way. AcquireAllIdle guarantees changed (replaces original content)In the original implementation, the Both The new implementation of This redefinition of List of idleResources is not generational stack (replaces
|
It looks good to me. That generational stack idea is pretty clever.
It's worth considering. puddle's primary reason for being is for pgx. But a secondary reason was to encapsulate all the tricky logic involved in a resource pool. As time and experience has proved it is very difficult to get right. I like the idea of there being a generic resource pool that solves all the low level problems. At the very least its overarching requirement of serving the needs of pgx should be documented.
I agree that should be documented.
That's fair. 100% test coverage is more of an emotional / peace of mind issue. I suppose we could instead say that we have 100% coverage of reachable code. With regards to the Go 1.19 requirement vs. an external dependency and with regards to wanting more testing, there might be a solution that solves both of these issues. If this is merged and tagged as |
I fully understand that you want to battle-test this solution and why you want to be careful about updating the Another point worth mentioning is whether you believe that someone would use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small drive-by comments, since I happened to see a request for reviews over in pgx. I fear I am not ready to dedicate the time to a deeper review...and this is definitely complicated, subtle code that warrants a serious review.
I noticed a stress test and a bit of fuzzing. Those are heartening. The more automated tests that let you have a computer explore the state space while you sleep, the better. (Assuming you actually do that, of course!)
// createNewResource creates a new resource and inserts it into list of pool | ||
// resources. | ||
// | ||
// WARNING: Caller of this method must hold the pool mutex! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, you can enforce this with sync.Mutex.TryLock: https://pkg.go.dev/sync#Mutex.TryLock. You call TryLock; if it succeeds, the mutex was not held, so you panic. If it fails, all's well. It should be a single atomic load, so not too expensive...but worth double-checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. The mutex could be held by concurrently running goroutine -> the TryLock
would fail, but you are not holder of the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could have false negatives, but it would have no false positives, which is the important thing for such a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could have false negatives, but it would have no false positives, which is the important thing for such a check.
True.
To be honest I don't see such a value in this check. I agree that technically it's possible and it could find some bug. But first, we all are trivially able to verify that both those calls of this function are bug free so no check that would slow real production code is necessary. For this keep in mind that atomic operation is still more costly than standard check for the panic referred above. Second reason not to do this check is that in some cases you do recursion in parallel code or multiple sub-calls and the cumulated cost of all those checks would be just too high. But this is an argument non-applicable in this situation, just a practice I try to follow.
"time" | ||
) | ||
|
||
// valueCancelCtx combines two contexts into one. One context is used for values and the other is used for cancellation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Contexts compose, so you shouldn't have to create your own type. (But maybe I'm missing something?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just code relocation.
But to answer your question: There is no way how to combine values of one context and cancellation of another context. Not without 3rd party libraries.
@jan-dubsky first of all thanks for putting this together, because as it turns out we're running into a similar problem :) My team and I are currently testing the latest version of your changes in a production like environment and will give you feedback as soon as we have it. |
Right, but I guess my thought was that anyone who needs / wants it could get it with a single
According to https://pkg.go.dev/github.com/jackc/puddle/v2?tab=importedby there are only a couple other projects directly using puddle v2. But pgx as a whole doesn't need to move to v2.1 for individuals to do so. Even a handful of people running it in production for a while would give me a lot more assurance. At the moment based on the comments in this PR at most 5 people have looked at the code -- to say nothing of production use or similar feedback. At the very least, tagging a puddle release some amount of time before the dependency was updated in a pgx release would make it easier for people to test. |
You are right. Let's go with the v2.1 tagging. Thanks for explanation :) How about the atomic types? Do you want to make Go1.19 required for v2.1 or will we use the uber atomic package as intermediate solution and drop it in February? |
Yeah, lets use the Go 1.19 atomics and avoid the dependency. |
I rebased this onto master and resolved the merge conflicts locally, then merged and pushed it back up. Github doesn't seem to recognize the merge but it is on master now. I tagged This seems to also resolve jackc/pgx#1354. If that proves to be accurate and we get some good user experience reports then it may be worth going back to uber atomic package for Go 1.18 compatibility and updating pgx to use use puddle Thanks @jan-dubsky! |
Motivation
We use
pgx
andpgxpool
in our production app. We were recently doing some load-tests with scenarios where the app has been overloaded with masive CPU computations running in other goroutines (which is realistic scenario for the app). Unfortunately what we have seen is that calls topgxpool.Pool.Acquire()
took about 5 seconds during our load-test. The 5 second limit was artificial as that has been the timeout of a single request to the load-tested app.Problem analysis
Out investigation showed that the problem is the following piece of code in
puddle.Pool.Acquire()
function:Namely the problem are those 2 background goroutines that are supposed to act as adapters between
sync.Cond
and channel.Scenario without cancellation
Let's start with single scenario when context is not cancelled (i.e. branch
case <-ctx.Done()
is never taken):Acquire()
function and reaches this block of code.select
statement.pgxpool
connection is our case) is released and the adapter#1 is signalled.This scenario is problematic mostly because the lock of the pool remains locked for significant amount of time. Any goroutine attempting to access the pool in between points (5) and (7) is blocked. This state also prevents the Go scheduler to prioritize the goroutine that holds the lock, because the one which locked it is already dead. Consequently, the Go scheduler would have to be able to perform quite complex analysis to find out which goroutine should have scheduling priority to unlock the lock.
Scenario with cancellation
Unfortunately the scenario described above is not the worst possible. The problematic scenario which caused our
Acquire()
calls to last up to 5 seconds is the one where context is cancelled (i.e. when thecase <-ctx.Done():
select
branch is taken):(Steps 1 to 3 are the same as in previous case)
Acquire()
function and reaches this block of code.select
statement.case <-ctx.Done():
is taken and adapter#2 goroutine is started:Acquire
calls - no actual work (read connection acquisition) is done.The problem with this scenario is that due to cancellations, there can be arbitrarily long chain of adapter#1 and adapter#2 goroutines all belonging to Acquires that were already cancelled. Consequently, one signal might jump K times via adapter#1 and adapter#2 without doing any useful work (read Acquiring a resource).
What is even worse is that this has a positive feedback loop - the more requests timeout the more adapters are there. The more adapters are there, the more goroutines have to be scheduled to do a single successful
Acquire
call. Consequently,Acquire
calls take longer, which results in higher chance toAcquire
timeout.Analysis summary
The problem are goroutines that are spawned to convert
sync.Cond
to channel (to allow double-select
with context). Naturally the other part of the problem are the adapters that convert channel back tosync.Cond
signal.Proposed solution
I propose to modify the code not to use
sync.Cond
at all. In this PR, I replace the conditional variably with semaphore. The logic chance can be described as following:Benchmarking the solution
This PR adds multiple benchmarks, but there are 2 benchmarks relevant to observe the difference:
BenchmarkAcquire_MultipleCancelled
andBenchmarkAcquire_MultipleCancelledWithCPULoad
. Both simulate a situation when some requests are cancelled.Results before the change:
Results after the change:
NOTE: Don't be surprised that the second test is faster per operation in case of the improved code - it does only 3 cancelled request per loop compared to 64 cancellation in case of the first test. The
*WithCPULoad
benchmark would last too long (before patch) if there were 64 cancellation.The other benchmarks added by this PR were written during the development so I have seen no reason not to preserve them.
Implementation notes
This section contains all other changes this diff introduce and provide comments on them.
Another bugfix: Avoid goroutine leak
When asynchronous resource creation in
Acquire
function failed, the error was unconditionally written to an unbufferred channel. But if no-one was listening to the unbufferred channel (because theAcquire
call was cancelled by the context), this write blocked forever. This caused a goroutine leak which could potentially result in application OOM kill.This bugfix has been discovered during the rewrite. This is the reason why it was not posted as a separate PR. Another reason is that this part of the code is subject of a heavy refactor in this PR and posting it in a separate PR would cause a merge conflict.
AcquireAllIdle implementation changed
In the original implementation, theAcquireAllIdle
guaranteed to atomically acquire all idle connections. In the new design, the implementation doesn't acquire all connections atomically. In this section, I will argue that this is not an issue and that the implementation still has same guarantees as the previous implementation.The design with semaphore requires theAcquireAllIdle
to Acquire tokens from the semaphore and only then it can lock the pool mutex. Analogously, the implementation has to acquire the pool mutex to find out how many idle connections are there. Those two steps cannot be swapped because of possible deadlock withAcquire
function. The solution is based on opportunistic approach - acquire as many tokens as possible first, then check how many connections are idle and release all tokens that exceed the number of idle resources.This approach is by far not atomic, but it's guarantees are strong enough to satisfy the requirement on theAcquiteAllIdle
behaviour. TheAcquireAllIdle
now guarantees that it returns all resources that (1) Were not idle before start of theAcquireAllIdle
AND (2) WhichAcquire
was not started whileAcquireAllIdle
was executing. None of those are an issue because in parallel environment the order of concurrent actions is not defined. Let's discuss those 2 conditions independently:1. Resources that are concurrently being released do not have to be acquired by theAcquireAllIdle
call. This situation is equivalent to the situation when resource is released just after theAcquireAllIdle
call.2. States whileAcquireAllIdle
is being executed, there can be another concurrentAcquire
call that will get a connection. This corresponds to the situation whenAcquire
is called beforeAcquireAllIdle calls
.As you can see, none of those 2 conditions are problematic in parallel environment and the behaviour ofAcquireAllIdle
still appears to be atomic.In the original implementation, the
AcquireAllIdle
function was able to atomically acquire all idle resources. This was true because theAcquireAllIdle
was synchronized with allAcquire
/TryAcquire
calls (via mutex). The new implementation comprises two synchronization primitives: mutex and semaphore. For this reason, the way howAcquire
/TryAcquire
andAcquireAllIdle
are synchronized is significantly more complex.Both
Acquire
andTryAcquire
are required to acquire the semaphore token before entering the critical section (locking the mutex). The semaphore acquire has approximately the following meaning: "Once semaphore is acquired, the goroutine has a reservation to acquire a resource". This implies thatAcquireAllIdle
is not allowed to take idle resources that are in the pool if the semaphore "reservation" has already been done by someAcquire
/TryAcquire
call running in parallel.The new implementation of
AcquireAllIdle
guarantees that it returns all resources that are idle and haven't been "reserved" by someAcquire
/TryAcquire
call. This is not an issue because once an idle resources is reserved, it's guaranteed that it will be acquired (see Generational stack below). This at least holds for idle resources in the pool.This redefinition of
AcquireAllIdle
guarantees is not problematic in parallel environment, because we can say that theAcquireAllIdle
call behaves as if allAcquire
/AcquireAllIdle
calls that managed to "reserve" the resource executed beforeAcquireAllIdle
. In the end the only thing we need to guarantee is that all resources that are idle at the time of callAcquireAllIdle
will get acquired eventually.## List of idleResources is now circular queueThis change was necessary because of change inAcquireAllIdle
. If we kept the stack representation (implemented by array append and pop-back) ofidleResources
, we wouldn't be able to guarantee properties ofAcquireAllIdle
. The undesired case would happen in case of the condition (1) in previous section:1. CallAcquireAllIdle
.2. Lock allK
semaphore tokens (inAcquireAllIdle
)3. Concurrent goroutine releases a resource.4.AcquireAllIdle
locks the mutex.5.AcquireAllIdle
acquires the newly released connection (it's the first one on top of a stack) andK-1
idle resources.The problem with this scenario is that it's no longer equivalent to a situation whenRelease
is called afterAcquireAllIdle
execution. The released resource is acquired instead of resourceidleResources[0]
. This issue has been addressed by using circular queue foridleResources
.List of idleResources is not generational stack (replaces
List of idleResources is now circular queue
)To support the current implementation of
AcquireAllIdle
, the list of idle resources had to be changed to generational stack. A generational stack behaves as a standard stack (allowspush
andpop
) in a single generation. In addition to that, a generational stack can start a new generation. The behaviour is following: All elements pushed in previous generation are popped before any element pushed in later generation.If we used a conventional stack, there would exist the following race between
Acquire
/TryAcquire
andAcquireAllIdle
:Acquire
(without loss of generality) acquires a semaphore token (reserves a resource) and is preempted before mutex lock.AcquireAllIdle
locks the mutex and consumes all remaining tokens from the semaphore.AcquireAllIdle
cannot take all idle connections because one of them is already "reserved" by concurrentAcquire
. For this reason it takes all but one idle connections and returns them expecting that the last idle connection will be taken by theAcquire
.Release
on a resource it has. ThisRelease
locks the mutex beforeAcquire
and pushes a new idle resource to the top of theidleResources
stack.Acquire
finally locks the mutex and pops pre resource from the top of the stack (the one released in (4)).As a result of this race, there is one "old" idle resource remaining at the bottom of the stack. Which would be an issue if
AcquireAllIdle
was used for keep-alive as doc-commented.To address this issue, we use generational stack instead of a normal stack and
AcquireAllIdle
always starts a new generation of the stack. Because the new generation is started in (3) (at the end ofAcquireAllIdle
), theRelease
call in (4) will push the released resource to the new stack generation. On the other hand theidleResources
pop in (5) will consume the resource from the old generation first so no old resource will remain in the pool.