Replies: 4 comments 32 replies
-
Thanks for the writeup, it's great to see this work happening. Here are a couple of thoughts. First, in the Open flow I'm seeing "async acquire semaphore" before anything happens - including even getting an idle connection. This seems like it would be a point of contention - I'm wondering why this step is necessary (and especially why it's necessary for before idle connection acquisition; could it at least be moved to cover only physical connection open?). Second (and probably related)... In offline discussions I think I understood that a SqlClient goal is to throttle connections to the server (SQL Server or Azure SQL), as a way of avoiding overloading the server with too many concurrent physical connection attempts. If this is right, here are some comments on this although I'm guessing it's a hard requirement from the server side which you have little control over:
One last thought... If client-side throttling is a must-have, then I'd at least consider implementing it via a pluggable policy mechanism such as Polly. Polly is a major resilience/fault tolerance framework, which also has circuit breaker and throttling functionalities. Allowing the connection pool to be configured with a Polly policy would allow users to specify rich behaviors for both retrying and throttling at the same time, and would externalize the problem (so you wouldn't have to implement it). Also, while the default could still be a throttling policy, someone would be able to disable that by passing in a non-throttling policy. The downside, of course, would be the added dependency on Polly, though that may be split out to a different package for those advanced users who need it. Anyway, happy to discuss all this further if needed! |
Beta Was this translation helpful? Give feedback.
-
Thanks for this @mdaigle. I've not got anything to add to the high-level design - just extra context and a few comments around the semaphore and cleanup logic.
|
Beta Was this translation helpful? Give feedback.
-
Lots of great discussion in here. I want to summarize what we have so far and split up topics into designated issues to keep things digestible. I will capture the outcomes of our discussions so far on these issues so that we can come to decisions on them. They will also be the best place going forward to discuss their respective topics. New topics not covered by these issues can of course be added to this discussion.
Please let me know if I missed anything! |
Beta Was this translation helpful? Give feedback.
-
Has #343 been discussed here? |
Beta Was this translation helpful? Give feedback.
-
Problem Statement
Modern .NET applications interact with Azure SQL server instances which have robust load balancing, scaling, and reliability. However, the current connection pool design does not take advantage of these properties and artificially limits connection creation throughput. Additionally, the async open path suffers from extremely poor performance. Customers want higher throughput connection creation so that their applications can scale and warm up faster. It’s advantageous from a cost saving and user experience perspective to allow customers to utilize the highest throughput that their system can support.
Connection opening and pooling is the front door to the SQL experience and forms a customer’s first impression of developing with SQL server. We need to redesign the .NET SqlClient connection pool to support modern Azure SQL use cases and C# language features.
Low connection creation throughput impacts customers in key scenarios, for example:
Root Causes
There are two main issues that cause the low connection creation throughput. These issues are inherent to the design of the connection pool and necessitate a rewrite.
1. Async unfriendly locks
All locks in the current connection pool are WaitHandle derived, and block on wait. Naturally, when blocking on acquire, parallel throughput is limited by the thread pool size. More modern locking mechanisms have support for async waits. Massive performance issues with the async open flow keep customers stuck on sync despite the drawbacks.
2. Async open requests are handled serially
The connection pool delegates async open requests to a background thread, on which they are queued up, handled synchronously, and passed back to the thread that first initiated the open using a task completion. This wastes thread resources and forces async opens through a funnel that serializes them. Developers expect async APIs to be performant and thread efficient. Counterintuitively, the async APIs for the connection pool are significantly slower and less efficient than their sync counterparts. At a deeper level, all create operations are also funneled through a single semaphore that guards creation.
Design Waypoints
Flow Diagrams
Figure 1: Proposed async open flow
Figure 2: Proposed return flow
Figure 3: Proposed cleanup/warmup flow
Appendix
Figure 4: Current internal data structures
Figure 5: Current create synchronization flow
Figure 6: Current async open flow
Beta Was this translation helpful? Give feedback.
All reactions