Skip to content

Conversation

@DylanDmitri
Copy link
Contributor

Added ResettableBooleanCompletionSource, used it in the LIFOQueuePolicy to avoid allocating unnecessarily.

Renamed various components for clarity and consistency.

Big Concern:

  • There's a unit test I need. If the resettable task errors, it gets swallowed by a couple layers of async state machines. Stephen helped me work through a bug where the middleware called .Result on the resettable Task twice, causing a hidden error. I tried a couple things, but can't find a way to write a good regression test.

Two Questions:

  • The ResettableBooleanCompletionSource holds a reference to a LifoQueuePolicy (to return itself for pooling on .GetResult()). This is moderate coupling. Should I remove the coupling (likely passing in an OnGetResult callback), move the awaitable to be part of the LIFOQueuePolicy.cs file, or is it good as is?
  • LIFOQueuePolicy vs LifoQueuePolicy vs SomeOtherNamePotentially. What's a good mix of clarity vs avoiding weird looking capital letters?


// enqueue request with a tcs
var tcs = new TaskCompletionSource<bool>(TaskCreationOptions.RunContinuationsAsynchronously);
var tcs = Interlocked.Exchange(ref _cachedResettableTCS, null) ?? new ResettableBooleanCompletionSource(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need Interlocked as its in a lock?

Suggested change
var tcs = Interlocked.Exchange(ref _cachedResettableTCS, null) ?? new ResettableBooleanCompletionSource(this);
var tcs = _cachedResettableTCS ??= new ResettableBooleanCompletionSource(this);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pooled TCSs are being placed back in that field by a Volatile.Write that's not in any lock, and so this change introduces a race condition and breaks unit tests.

I will go run benchmarks on several combinations of locks, and see what's best. Thinking

  • Current strategy
  • Interlocked for the write, this change for the read
  • Locking on the same lock for the write (probably slowest)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't see was different class returning it. Probably good as now.

Does mean only one gets reused rather than a pool of them?

Copy link
Member

@halter73 halter73 Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mean only one gets reused rather than a pool of them?

Correct. Generally the call to GetResult() happens inline and returns the ResettableBooleanCompletionSource to the field before the next attempt to retrieve/alloc another.

It's important that we await something like ThreadPoolAwaitable before calling _next(context) in ConcurrencyLimiterMiddleware so we do not:

A) Run app code (i.e. _next(context)) with the LIFOQueuePolicy _bufferLock acquired.
B) Stack dive by calling repeatedly _next(context) inline inside LIFOQueuePolicy.OnExit() which could ultimately lead to a stack overflow.

It would be safer to dispatch at the LIFOQueuePolicy layer instead of the ConcurrencyLimiterMiddleware, since the inline-scheduled ValueTask is exposed to end users if they resolve the LIFOQueuePolicy as an IQueuePolicy from DI.

The problem is without inline scheduling, you lose the nice behavior where GetResult() happens inline and returns the ResettableBooleanCompletionSource to the field before the next attempt to retrieve/alloc another.

Copy link
Contributor Author

@DylanDmitri DylanDmitri Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the pooling works, is that the queue is full of these awaitables. When one gets awaited on, and has .GetResult called, it resets itself and puts itself back into the _cache field. So there's a pool, containing: the awaitables in active use + one additional cached awaitable.

If the queue goes back down, from say 1000 items to 0, then 999 of those thousand will get GCd and 1 will be kept. What this optimizes for is the case of heavy load / ddos, where the queue is stuck at full and requests keep coming in; it avoids allocating extra completion sources in this state.


So I found why the unit tests broke, and it was because the queue policy needs to reset the cache to null to prevent to awaitable from getting regrabbed before it's ready.

Ended up testing five different scenarios, by far the best performance was the simple approach of:

// no locking or checking within the CompletionSource reset
_mrvts.Reset();
_queue._cachedResettableTCS = this;
// no additional locking within the queue policy either, setting to null after
var tcs = _cachedResettableTCS ??= new ResettableBooleanCompletionSource(this);
_cachedResettableTCS = null;

This feels sloppy but safe. The policy can never double grab from the cache, because it's stuck in a lock, and if the reset fails to go through (which happens <1% on my benchmarks) worst case is allocating a couple hundred bytes.


Net perf gains are (in terms of strict overhead from my middleware):

  • empty queue 14% faster (466 -> 399ns)
  • full queue being emptied 24% faster (554 -> 422 ns)
  • full queue actively rejecting 8% faster (732 -> 670ns)

Good catch!

@halter73
Copy link
Member

Should I remove the coupling (likely passing in an OnGetResult callback)

I don't think so. I generally don't like decoupling until there are at least 3 unique consumers. The extra abstraction can make it harder to reason about what's really happening, and without multiple consumers you cannot really be sure if it's a good abstraction to begin with.

LIFOQueuePolicy vs LifoQueuePolicy vs SomeOtherNamePotentially. What's a good mix of clarity vs avoiding weird looking capital letters?

@davidfowl would warn you agains trusting me to name things, but I think QueuePolicy and StackPolicy is likely fine.

@DylanDmitri
Copy link
Contributor Author

@DAllanCarr The "using" statement lets the event sources track the duration spent in queue. It starts a new stopwatch, and reads the value when disposing.

If it wasn't a using statement, that pattern would look like:

var timer = ConcurrencyLimiterEventSource.Log.StartQueueTimer();
result = await waitInQueueTask;
ConcurrencyLimiterEventSource.Log.RecordTimeElapsed(timer);

vs

using (ConcurrencyLimiterEventSource.Log.QueueTimer())
{
    result = await waitInQueueTask;
}

}

if (waitInQueueTask.Result)
if (result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would any unit tests now fail if you were to check waitInQueueTask.Result here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit confused here. I thought the point of this change was to avoid calling GetResult twice. Wouldn't checking waitInQueueTask.Result here run into the same issue?

private static async Task RunsContinuationsAsynchronouslyInternally()
{
var tcs = new ResettableBooleanCompletionSource(_testQueue);
var mre = new ManualResetEventSlim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this test work with a TaskCompletionSource instead of MRE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a TCS, but you'd have to call .Wait() or something on it. It probably is a good idea to wrap the test in a try/finally, and call .Set() on the MRE in the finally so this test doesn't end up blocking a thread indefinitely if it fails.


namespace Microsoft.AspNetCore.ConcurrencyLimiter
{
internal class ResettableBooleanCompletionSource : IValueTaskSource<bool>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a comment with a general description for this class.

Copy link
Contributor Author

@DylanDmitri DylanDmitri Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Custom awaiter to allow the StackPolicy to reduce allocations.
/// When this completion source has its result checked, it resets itself and returns itself to the cache of its parent StackPolicy.
/// Then when the StackPolicy needs a new completion source, it tries to get one from its cache, otherwise it allocates.

This seem good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me

{
public static Microsoft.Extensions.DependencyInjection.IServiceCollection AddFIFOQueue(this Microsoft.Extensions.DependencyInjection.IServiceCollection services, System.Action<Microsoft.AspNetCore.ConcurrencyLimiter.QueuePolicyOptions> configure) { throw null; }
public static Microsoft.Extensions.DependencyInjection.IServiceCollection AddLIFOQueue(this Microsoft.Extensions.DependencyInjection.IServiceCollection services, System.Action<Microsoft.AspNetCore.ConcurrencyLimiter.QueuePolicyOptions> configure) { throw null; }
public static Microsoft.Extensions.DependencyInjection.IServiceCollection AddQueuePolicy(this Microsoft.Extensions.DependencyInjection.IServiceCollection services, System.Action<Microsoft.AspNetCore.ConcurrencyLimiter.QueuePolicyOptions> configure) { throw null; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be next to other stuff unrelated to to the ConcurrencyLimiter like collection.AddMvc(). I think this will make it confusing to most developers what this is adding a QueuePolicy to. Maybe this should be AddConcurrencyLimiterQueuePolicy() or AddConcurrencyLimiterStackPolicy().

I don't want to hold up the PR on this though. We can have a bigger discussion over naming and do a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up PR sounds good

@DylanDmitri DylanDmitri merged commit 1aebfa6 into master Jul 25, 2019
@ghost ghost deleted the dylan/resettableTCS branch July 25, 2019 23:38
@amcasey amcasey added the area-middleware Includes: URL rewrite, redirect, response cache/compression, session, and other general middlewares label Jun 2, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-middleware Includes: URL rewrite, redirect, response cache/compression, session, and other general middlewares

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants