Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable execution strategy to control buffering independently of retries #30112

Closed
frankbuckley opened this issue Jan 22, 2023 · 5 comments
Closed

Comments

@frankbuckley
Copy link

frankbuckley commented Jan 22, 2023

Motivation

We have a number of API apps hosted in Azure Kubernetes Service clusters using EF core to access Azure SQL Databases. Our general policy is to call EnableRetryOnFailure() when configuring our DbContexts. Another general policy is to return IAsyncEnumerables back to our endpoints when returning multiple objects to the API client to support streaming.

The past couple of months has been interesting from a memory perspective with .NET 7.0 bringing GC regions and AKS moving nodes from Ubuntu 18.04 to 22.04 with the Kubernetes 1.25 upgrade, bringing with it cgroup v2. Subsequently, we have seen increases in containers being OOMKilled and an increase in OutOfMemoryExceptions in our apps. Much of this has been mitigated by tuning GC parameters and tweaking resource limits and some improvements to our code.

However, we have one stubborn source of OOM exceptions that is seen in requests to an API that returns a sizeable resultset and streams it to the response body. Because we have retries enabled, the results are being buffered, so instead of ~30Kb per object/row, with page size of 50, ~1.5Mb is allocated per request.

In our systems, we observe more faults during connection than during command execution. We would therefore like to implement an IExecutionStrategy that does not buffer and retries exceptions arising from connections but not exceptions arising from command execution. When (less commonly) a fault occurs while streaming we are happy to fail and let the API client (or, in-cluster, Dapr) initiate retries.

Our typical use case for a connection-retrying, non-buffering execution strategy is data retrieval where entity/object count > 1 and
not updates, inserts or deletes - for these cases, we would likely leave buffering on.

Background

Retries

Distributed applications should handle transient errors where possible. EF provides a mechanism for retrying database commands in the event of failures within defined parameters through IExecutionStrategy with a base implementation provided in ExecutionStrategy. A concrete implementation is provided for SQL Server (SqlServerRetryingExecutionStrategy).

Memory constraints

It is common for cloud native applications to operate with defined memory constraints (e.g Kubernetes Limit Ranges) where operators are seeking to maximise container density.

Streaming

For an API app accessing a database backend, the most memory efficient design is to write results to the client and they are read from the backend – iterating over the resultset and writing to the client response a row at a time. This will often keep memory use limited to the contents of one row at a time. Where an app materialises a resultset (e.g. ToArray()) , memory consumption can rise by many multiples of a single row.

CPU usage / throughput

In addition to conserving memory, streaming may also be computationally more efficient and improve throughput as it will reduce pressure on the GC.

Buffering

Currently, EF buffers results (materialises them into memory) when an execution strategy is in place that might retry in the event of a failure:

IsBuffering = ExecutionStrategy.Current?.RetriesOnFailure ?? dependencies.IsRetryingExecutionStrategy;

This explicitly ties the decision to buffer to the decision to retry.

To disable buffering in situations where a default (buffering) execution strategy is configured, the advice is to set the maximum retry count to 0 or override RetriesOnFailure to return false.
However, this might have side effects:

if (RetriesOnFailure
&& (Dependencies.CurrentContext.Context.Database.CurrentTransaction is not null
|| Dependencies.CurrentContext.Context.Database.GetEnlistedTransaction() is not null
|| (((IDatabaseFacadeDependenciesAccessor)Dependencies.CurrentContext.Context.Database).Dependencies
.TransactionManager as
ITransactionEnlistmentManager)?.CurrentAmbientTransaction is not null))
{
throw new InvalidOperationException(

It is not clear how to disable buffering and (in some circumstances) permit retries. Doing so today might look something like:

public class SqlServerNonBufferingRetryingExecutionStrategy : ExecutionStrategy
{
    public SqlServerNonBufferingRetryingExecutionStrategy(DbContext context)
        : this(context, DefaultMaxRetryCount, DefaultMaxDelay)
    {
    }

    public SqlServerNonBufferingRetryingExecutionStrategy(DbContext context, int maxRetryCount, TimeSpan maxRetryDelay)
        : base(context, maxRetryCount, maxRetryDelay)
    {
    }

    public SqlServerNonBufferingRetryingExecutionStrategy(ExecutionStrategyDependencies dependencies)
        : base(dependencies, DefaultMaxRetryCount, DefaultMaxDelay)
    {
    }

    public SqlServerNonBufferingRetryingExecutionStrategy(
        ExecutionStrategyDependencies dependencies,
        int maxRetryCount,
        TimeSpan maxRetryDelay)
        : base(dependencies, maxRetryCount, maxRetryDelay)
    {
    }

    public bool IsBuffering => false;

    /// <summary>
    /// [Hack] Overridden to return <see langword="false"/> when <see cref="IsBuffering"/>
    /// returns <see langword="false"/> to disabled buffering. Does not indicate of retries
    /// are really not possible - see <see cref="RetriesOnFailureForReal"/>.
    /// </summary>
    /// <remarks>
    /// <see href="https://github.com/dotnet/efcore/issues/30112"/>
    /// </remarks>
    public override bool RetriesOnFailure => IsBuffering;

    /// <summary>
    /// Indicates whether this <see cref="IExecutionStrategy" /> might retry the execution after a failure.
    /// </summary>
    public bool RetriesOnFailureForReal
    {
        get
        {
            var current = Current;
            return (current == null || current == this) && MaxRetryCount > 0;
        }
    }

    protected override void OnFirstExecution()
    {
        if (RetriesOnFailureForReal
            && (Dependencies.CurrentContext.Context.Database.CurrentTransaction is not null
                || Dependencies.CurrentContext.Context.Database.GetEnlistedTransaction() is not null
                || (((IDatabaseFacadeDependenciesAccessor)Dependencies.CurrentContext.Context.Database).Dependencies
                    .TransactionManager as
                    ITransactionEnlistmentManager)?.CurrentAmbientTransaction is not null))
        {
            throw new InvalidOperationException(
                CoreStrings.ExecutionStrategyExistingTransaction(
                    GetType().Name,
                    nameof(DbContext)
                    + "."
                    + nameof(DbContext.Database)
                    + "."
                    + nameof(DatabaseFacade.CreateExecutionStrategy)
                    + "()"));
        }

        ExceptionsEncountered.Clear();
    }

    protected override bool ShouldRetryOn(Exception exception)
    {
        return SqlServerTransientConnectionExceptionDetector.ShouldRetryOn(exception);
    }
}

Errors during connection vs errors during command execution

Azure SQL Database documentation draws a distinction between errors that occur when attempting to establish a connection and errors that occur when executing a command.

Users of EF may wish to construct execution strategies that distinguish between connection errors and command execution errors to avoid buffering in most cases.

Proposal

Add IExecutionStrategy.IsBuffering

Update IExecutionStrategy to include an IsBuffering property with a default implementation to preserve existing behaviour for anyone who has already implemented the interface:

    bool IsBuffering => RetriesOnFailure;

Add ExecutionStrategy.IsBuffering

Provide an implementation in ExecutionStrategy that can be overridden:

    public virtual bool IsBuffering => RetriesOnFailure;

Add QueryCompilationContextDependencies.IsBufferingStrategy

Expose an IsBufferingStrategy property in QueryCompilationContextDependencies and initialise it from executionStrategy.IsBuffering:

    public QueryCompilationContextDependencies(
        IModel model,
        IQueryTranslationPreprocessorFactory queryTranslationPreprocessorFactory,
        IQueryableMethodTranslatingExpressionVisitorFactory queryableMethodTranslatingExpressionVisitorFactory,
        IQueryTranslationPostprocessorFactory queryTranslationPostprocessorFactory,
        IShapedQueryCompilingExpressionVisitorFactory shapedQueryCompilingExpressionVisitorFactory,
        IExecutionStrategy executionStrategy,
        ICurrentDbContext currentContext,
        IDbContextOptions contextOptions,
        IDiagnosticsLogger<DbLoggerCategory.Query> logger,
        IInterceptors interceptors)
    {
        _currentContext = currentContext;
        Model = model;
        QueryTranslationPreprocessorFactory = queryTranslationPreprocessorFactory;
        QueryableMethodTranslatingExpressionVisitorFactory = queryableMethodTranslatingExpressionVisitorFactory;
        QueryTranslationPostprocessorFactory = queryTranslationPostprocessorFactory;
        ShapedQueryCompilingExpressionVisitorFactory = shapedQueryCompilingExpressionVisitorFactory;
+       IsBufferingExecutionStrategy = executionStrategy.IsBuffering;
        IsRetryingExecutionStrategy = executionStrategy.RetriesOnFailure;
        ContextOptions = contextOptions;
        Logger = logger;
        Interceptors = interceptors;
    }

    // ...

+   public bool IsBufferingExecutionStrategy { get; init; }

Update initialisation of QueryCompilationContext.IsBuffering

    IsBuffering = ExecutionStrategy.Current?. IsBuffering?? dependencies. IsRetryingExecutionStrategy;

This will enable the implementation of an ExecutionStrategy that returns true to RetriesOnFailure and false to IsBuffering.

EF users could then implement a non-buffering execution strategy.

EF (or Azure?) might then consider providing a non-buffering execution strategy pre-tuned to retry on common Azure Sql Database connection errors, but to fail on command execution errors. Or, the community can develop some good examples.

These changes do not break any existing tests.

Related issues

To resolve #23721, it seems the link between RetriesOnFailure and buffering would need to be changed – so it seems this proposal would be a prerequisite.

Workarounds

@roji
Copy link
Member

roji commented Jan 22, 2023

As per #23721 (comment), I don't believe we should allow users to just disable buffering; that would be a big pit of failure, as they'd continue to retry when executing queries, but that retrying is effectively useless without buffering.

If the goal is to retry for connection errors, but not retry for query execution, then we should have an API that does precisely that. In other words, the API should express when retrying should actually occur, rather than allow to turn off buffering.

@frankbuckley
Copy link
Author

To clarify, I am not suggesting that you “should allow users to just disable buffering” – existing behaviour with EnableRetryOnFailure and SqlServerRetryingExecutionStrategy would be preserved as today. You will not be able to new up a SqlServerRetryingExecutionStrategy and set IsBuffering to false.

Users would have to explicitly instantiate a (say) NoBufferingConnectionRetryingExecutionStrategy to get the no buffering behaviour. You could choose to not offer no-buffering implementations if you want to make it more awkward/deliberate for users to choose such a strategy.

That said, it would be still nice to be able to implement such a strategy if you really want to (without the counter-intuitive, perhaps-one-day-breaking override of RetriesOnFailure).

@ajcvickers
Copy link
Member

We discussed this in triage and believe it is covered by #23721 and #30023.

@frankbuckley
Copy link
Author

Thank you for considering.

Is there anything that can be offered by contributors to help move #23721 and/or #30023 off the backlog?

@ajcvickers
Copy link
Member

#23721 should be relatively straightforward. #30023 needs a bit of design work; if you plan to tackle this, then please post a proposal for the changes needed and new API before starting work.

@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants