Fix race conditions in the connection pool code.#18447
Fix race conditions in the connection pool code.#18447arthurschreiber wants to merge 1 commit intovitessio:mainfrom
Conversation
Signed-off-by: Arthur Schreiber <arthurschreiber@github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
|
Unfortunately, this does have performance impact on the pool performance (but whether that's noticeable in real world scenarios is unclear to me).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18447 +/- ##
==========================================
+ Coverage 67.51% 67.56% +0.05%
==========================================
Files 1607 1607
Lines 262687 264199 +1512
==========================================
+ Hits 177340 178507 +1167
- Misses 85347 85692 +345 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I have a test over at https://gist.github.com/arthurschreiber/fffe5c2ecf550e75fdee6e97cc08f9df that can reproduce the problem. Unfortunately, the test is flaky so I've not added it to this PR. |
| } else { | ||
| stack := connSetting.bucket & stackMask | ||
| pool.settings[stack].Push(conn) | ||
| pool.freshSettingsStack.Store(int64(stack)) |
There was a problem hiding this comment.
I wonder, would an alternative approach work to avoid having to lock? What if we still optimistically put it into the stack, but then check after we've completed that if we have since been closed and then remove it again?
So you only pay the price during closing mostly and then having to do any locking or other synchronization?
There was a problem hiding this comment.
I'm definitely open to alternative approaches. For your suggestion, don't we have to take a lock on capacityMu to correctly decide whether the pool is closed or not?
There was a problem hiding this comment.
We can think about optimistic concurrency approach with epoch validation.
ConnPool can have epoch atomic.Uint64. We invalidated with increment whenever there is a change like Close or Capacity.
In tryReturnConn we read the epoch value and then read the capacity. Before returning the connection back to the stack we check the epoch again If it is changed we continue otherwise return to the stack.
After returning we check again and if there is a change we try to get the connection back, if we receive the connection continue again.
We do not expect capacity to change that frequently.
There was a problem hiding this comment.
I am working on an additional change that uses a pool generation number to allow Close() to return instantly and basically break the link between checked out connections and the pool.
I think that could be used for the optimistic concurrency check. I'll try this out and see if I can make this work.
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
| if isClosed() { | ||
| // if the pool is closed, we can't wait for a connection, so return an error | ||
| wl.nodes.Put(elem) |
There was a problem hiding this comment.
Can the pool be re-opened between these two calls?
|
This has been superseded by: #18713 |
Description
This fixes the two race conditions described in #18202 (comment)
Related Issue(s)
#18202
Checklist
Deployment Notes