Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query/Perf: don't compile liftable constant resolvers in interpretation mode when the resolver itself contains a lambda #35208

Closed
maumar opened this issue Nov 26, 2024 · 3 comments · Fixed by #35209
Assignees
Labels
area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression
Milestone

Comments

@maumar
Copy link
Contributor

maumar commented Nov 26, 2024

This is part of a bigger perf regression between EF8 and EF9: #35053

When processing LiftableConstantExpressions in the regular (non-AOT) mode we compile the resolver lambda and then evaluate it to get back the actual constant we plan to use. We do the compilation in the interpretation mode and that causes problems if the resulting object is itself (or contains) a delegate. Compiling into a delegate using interpreted mode, has impact on allocations as well as execution speed.

before the fix (this already includes the invoke fix #35206)

Method Async Mean Error StdDev Op/s Gen0 Gen1 Allocated
MultiInclue False 487.1 ms 0.99 ms 0.88 ms 2.053 17000.0000 11000.0000 103.29 MB
MultiInclue True 487.4 ms 3.63 ms 3.22 ms 2.052 17000.0000 11000.0000 103.29 MB

after the fix

Method Async Mean Error StdDev Op/s Gen0 Gen1 Allocated
MultiInclue False 455.1 ms 8.94 ms 10.29 ms 2.197 11000.0000 6000.0000 67.92 MB
MultiInclue True 435.4 ms 1.77 ms 1.66 ms 2.297 11000.0000 6000.0000 67.92 MB
@maumar
Copy link
Contributor Author

maumar commented Nov 26, 2024

smoking gun benchmark:

    internal class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run(args, typeof(Program).Assembly);
        }
    }

    public static class BenchmarkRunner
    {
        public static void Run(string[] args, Assembly assembly, IConfig config = null)
        {
            config ??= DefaultConfig.Instance;

            config = config
                .AddDiagnoser(MemoryDiagnoser.Default)
                .AddColumn(StatisticColumn.OperationsPerSecond);

            BenchmarkSwitcher.FromAssembly(assembly).Run(args, config);
        }
    }

    public class InterpretationBenchmark
    {
        private Expression<Func<object, Func<object?, object?, bool>[]>> _expression;
        private Func<object, object, bool>[] _compiled;
        private Func<object, object, bool>[] _compiledInterpreted;

        [GlobalSetup]
        public virtual void Initialize()
        {
            var left = Expression.Parameter(typeof(object), "left");
            var right = Expression.Parameter(typeof(object), "right");

            var prm = Expression.Parameter(typeof(object));
            _expression = Expression.Lambda<Func<object, Func<object?, object?, bool>[]>>(
                Expression.NewArrayInit(
                    typeof(Func<object, object, bool>),
                    Expression.Lambda<Func<object?, object?, bool>>(
                        Expression.Condition(
                            Expression.Equal(left, Expression.Constant(null)),
                            Expression.Equal(right, Expression.Constant(null)),
                            Expression.AndAlso(
                                Expression.NotEqual(right, Expression.Constant(null)),
                                Expression.Equal(
                                    Expression.Convert(left, typeof(int)),
                                    Expression.Convert(left, typeof(int))))),
                      left,
                      right)), prm);

            _compiled = _expression.Compile()("_");
            _compiledInterpreted = _expression.Compile(preferInterpretation: true)("_");
        }

        [Benchmark]
        public virtual void Compiled()
        {
            var equal = 0;
            for (var i = 0; i < 100; i++)
            {
                for (var j = 0; j < 100; j++)
                {
                    if (_compiled[0](i, j))
                    {
                        equal++;
                    }
                }
            }
        }

        [Benchmark]
        public virtual void CompiledInterpolated()
        {
            var equal = 0;
            for (var i = 0; i < 100; i++)
            {
                for (var j = 0; j < 100; j++)
                {
                    if (_compiledInterpreted[0](i, j))
                    {
                        equal++;
                    }
                }
            }
        }
    }

results:

Method Mean Error StdDev Op/s Gen0 Allocated
Compiled 65.69 us 0.706 us 0.660 us 15,223.4 76.4160 468.75 KB
CompiledInterpolated 930.20 us 7.046 us 6.591 us 1,075.0 433.5938 2656.25 KB

maumar added a commit that referenced this issue Nov 26, 2024
… in interpretation mode when the resolver itself contains a lambda

In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide.
At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda).
Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient.

Fix is to scan the resolver expression and look for nested lambdas inside - if we find some, compile the resolver in the regular mode instead.

Fixes #35208

This is part of a fix for a larger perf issue: #35053
maumar added a commit that referenced this issue Nov 26, 2024
… in interpretation mode when the resolver itself contains a lambda

In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide.
At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda).
Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient.

Fix is to use regular compilation rather than interpretation.

Fixes #35208

This is part of a fix for a larger perf issue: #35053
maumar added a commit that referenced this issue Nov 26, 2024
Port of #35209

**Description**
In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide.
At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda).
Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient when invoked multiple times when the query runs.
Fix is to use regular compilation rather than interpretation.

**Customer impact**
Queries using collection navigation with significant amount of data suffer large performance degradation when compared with EF8. No good workaround.

**How found**
Multiple customer reports on 9.0.0

**Regression**
Yes, from 8.0.

**Testing**
Ad-hoc perf testing with BenchmarkDotNet. Functional change already covered by numerous tests.

**Risk**
Low, quirk added.
@maumar maumar closed this as completed in 1319ed4 Nov 26, 2024
@maumar
Copy link
Contributor Author

maumar commented Nov 26, 2024

reopening for potential patch

@maumar maumar reopened this Nov 26, 2024
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Nov 26, 2024
maumar added a commit that referenced this issue Nov 27, 2024
Port of #35209

**Description**
In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide.
At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda).
Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient when invoked multiple times when the query runs.
Fix is to use regular compilation rather than interpretation.

**Customer impact**
Queries using collection navigation with significant amount of data suffer large performance degradation when compared with EF8. No good workaround.

**How found**
Multiple customer reports on 9.0.0

**Regression**
Yes, from 8.0.

**Testing**
Ad-hoc perf testing with BenchmarkDotNet. Functional change already covered by numerous tests.

**Risk**
Low, quirk added.
@maumar maumar added this to the 9.0.x milestone Nov 27, 2024
@maumar
Copy link
Contributor Author

maumar commented Nov 30, 2024

fixed for 9.0.1 - b50581a

@maumar maumar closed this as completed Nov 30, 2024
@maumar maumar modified the milestones: 9.0.x, 9.0.1 Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression
Projects
None yet
1 participant