Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify COALESCE based on the nullability of its arguments #33938

Merged
merged 6 commits into from
Jul 2, 2024

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Jun 9, 2024

Drop all of the arguments after the first non-nullable sub-expressions.
Simplify unary COALESCE to its argument.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great - there's only the problem I mentioned in #33890:

The main difficulty here is that we have various tests exercising Coalesce functionality, which are implemented over non-nullable columns; the Coalesce node would be stripped away there, and the tests would become useless. These need to be updated.

In other words this reduces our test coverage for coalesce but optimizing it away... Ideally we'd modify the tests to ensure that they actually exercise coalesce etc.

I'd also implement the SqlExpressionFactory (optimize at the source) as in #33890 at the same time (though it could be done separately too of course).

/cc @maumar

@@ -105,7 +105,7 @@ public class SqlServerStringAggregateMethodTranslator : IAggregateMethodCallTran
source,
enumerableArgumentIndex: 0,
nullable: true,
argumentsPropagateNullability: new[] { false, true },
argumentsPropagateNullability: new[] { false, false },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch.

@ranma42
Copy link
Contributor Author

ranma42 commented Jun 9, 2024

This looks great - there's only the problem I mentioned in #33890:

The main difficulty here is that we have various tests exercising Coalesce functionality, which are implemented over non-nullable columns; the Coalesce node would be stripped away there, and the tests would become useless. These need to be updated.

In other words this reduces our test coverage for coalesce but optimizing it away... Ideally we'd modify the tests to ensure that they actually exercise coalesce etc.

I'll look into adding/updating tests before marking this as ready for review. Currently I am thinking of:

  • updating the tests that explicitly want a COALESCE so that it is not optimized away
  • adding some tests that check that n-ary COALESCE can be simplified to a COALESCE of a prefix of its arguments or just its first argument (with no COALESCE at all)

I'd also implement the SqlExpressionFactory (optimize at the source) as in #33890 at the same time (though it could be done separately too of course).

We can do that, but SqlExpressionFactory is currently lacking quite a lot of information to optimize as effectively as nullability can only be (trivially) computed for constants and columns, so optimizing in this location would still be useful.

This might change if SqlExpressions were extended with nullability information, as proposed in #33889 (which would require more invasive changes).

@roji
Copy link
Member

roji commented Jun 9, 2024

I'll look into adding/updating tests before marking this as ready for review. Currently I am thinking of:

Sounds great. Hopefully there are nullable alternatives in the model for the tests in question:::

We can do that, but SqlExpressionFactory is currently lacking quite a lot of information to optimize as effectively as nullability can only be (trivially) computed for constants and columns, so optimizing in this location would still be useful.

Oh I completely agree - both optimizations make sense for sure. The SqlExpressionFactory really is more of a nice-to-have compared to the one implemented here in SqlNullabilityProcessor. I've updated #33890 to track both these things.

This might change if SqlExpressions were extended with nullability information, as proposed in #33889 (which would require more invasive changes).

Yeah, that's definitely a long-term discussion more than anything immediately actionable - we should proceed for now with the current query architecture regardless of that possible direction.

@roji roji self-assigned this Jun 9, 2024
@roji
Copy link
Member

roji commented Jun 9, 2024

Assigning to myself but @maumar definitely let us know what you think etc.

@maumar
Copy link
Contributor

maumar commented Jun 11, 2024

I'll look into adding/updating tests before marking this as ready for review. Currently I am thinking of:

Sounds great. Hopefully there are nullable alternatives in the model for the tests in question:::

I actually don't think we have a "natural" null value in a bool column in gears model. But you can always make one by doing a left join.

CogTag.Gear is optional 1:1 and there is a cog tag without corresponding gear, so

ss.Set<CogTag>().Where(x => (bool?)x.Gear.HasSoulPatch ?? false) will exercise the code path and will be reflected in the results

ranma42 added a commit to ranma42/efcore that referenced this pull request Jun 16, 2024
This change aims at ensuring that the expressions within the queries being
tested cannot be trivially optimized away.

Base on the suggestion by @maumar in
dotnet#33938 (comment)
@ranma42
Copy link
Contributor Author

ranma42 commented Jun 17, 2024

@roji I implemented the SqlExpressionFactory changes in #34002 and based this PR on top of that (in the first PR I update the test so that they keep checking the appropriate COALESCE handling; in this PR I added tests for the optimization).

ranma42 added a commit to ranma42/efcore that referenced this pull request Jun 18, 2024
This change aims at ensuring that the expressions within the queries being
tested cannot be trivially optimized away.

Based on the suggestion by @maumar in
dotnet#33938 (comment)
@ranma42 ranma42 force-pushed the simplify-coalesce branch 2 times, most recently from be230cd to 473eb2e Compare June 24, 2024 15:21
@ranma42
Copy link
Contributor Author

ranma42 commented Jun 24, 2024

The first commit belongs to #34078 (I rebased on top of that to have a working build). Apart from that, it is ready for review 🚀

@ranma42 ranma42 marked this pull request as ready for review June 24, 2024 17:45
@ranma42
Copy link
Contributor Author

ranma42 commented Jun 27, 2024

Rebased to resolve merge conflict

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great as always, thanks. See two minor suggestions below.

Comment on lines 1434 to 1451
if (coalesceArguments.Count == 1)
{
return coalesceArguments[0];
}
else if (coalesceArguments.Count == sqlFunctionExpression.Arguments.Count)
{
return sqlFunctionExpression.Update(sqlFunctionExpression.Instance, coalesceArguments);
}
else
{
return _sqlExpressionFactory.Function(
sqlFunctionExpression.Name,
coalesceArguments,
sqlFunctionExpression.IsNullable,
argumentsPropagateNullability: coalesceArguments.Select(_ => false).ToArray(),
sqlFunctionExpression.Type,
sqlFunctionExpression.TypeMapping);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm a big fan of switches (especially expressions):

Suggested change
if (coalesceArguments.Count == 1)
{
return coalesceArguments[0];
}
else if (coalesceArguments.Count == sqlFunctionExpression.Arguments.Count)
{
return sqlFunctionExpression.Update(sqlFunctionExpression.Instance, coalesceArguments);
}
else
{
return _sqlExpressionFactory.Function(
sqlFunctionExpression.Name,
coalesceArguments,
sqlFunctionExpression.IsNullable,
argumentsPropagateNullability: coalesceArguments.Select(_ => false).ToArray(),
sqlFunctionExpression.Type,
sqlFunctionExpression.TypeMapping);
}
return coalesceArguments switch
{
[var singleArgument] => singleArgument,
_ when coalesceArguments.Count == sqlFunctionExpression.Arguments.Count
=> sqlFunctionExpression.Update(sqlFunctionExpression.Instance, coalesceArguments),
_ => _sqlExpressionFactory.Function(
sqlFunctionExpression.Name,
coalesceArguments,
sqlFunctionExpression.IsNullable,
argumentsPropagateNullability: coalesceArguments.Select(_ => false).ToArray(),
sqlFunctionExpression.Type,
sqlFunctionExpression.TypeMapping)
};

}
else
{
return _sqlExpressionFactory.Function(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to just have SqlFunctionExpression.Update accept an optional argumentsPropagateNullability argument

@ranma42 ranma42 force-pushed the simplify-coalesce branch 2 times, most recently from 8d6a473 to 3387737 Compare June 27, 2024 07:46
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last tiny nit...

SqlExpression? instance,
IReadOnlyList<SqlExpression>? arguments,
IReadOnlyList<bool>? argumentsPropagateNullability = null)
=> instance != Instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do a pass to invert the logic of all of these across all expression types at some point (I hate negation + else)

Copy link
Contributor Author

@ranma42 ranma42 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we are handling this, should we also check that the arguments make sense?

Currently foo.Update(foo.Instance, null) always returns foo (which might not be intended).
Similarly, if bar is niladic (aka bar.ArgumentsPropagateNullability is null), bar.Update(bar.Instance, [_factory.Constant(3)]) returns bar (arguments is ignored).

We could prevent switching from/to niladic, but I am unsure if this is a check we care about (it is an internal API).

}

nullable = coalesceNullable;

return sqlFunctionExpression.Update(sqlFunctionExpression.Instance, coalesceArguments);
return coalesceArguments switch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that this has only two arms, it can be a conditional operator

{
argumentsPropagateNullability ??= ArgumentsPropagateNullability;

if (IsNiladic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these checks be in the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add them to the constructor if we want to, but they would not be sufficient to capture the misuse as mentioned in #33938 (comment)

Specifically, if either arguments or Arguments is null, the code would skip the constructor and just return this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roji what is the best way forward? should I add checks both to the constructor and here?

Copy link
Member

@roji roji Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping.

I took another look... I'm not sure why SqlFunctionExpression has a separate IsNiladic property (I've been wanting to take a look at this for a while): the arguments being null (as opposed to empty) seems like the natural way to express a niladic function, and it indeed seems like whenever niladic is true, arguments is null and vice versa.

So unless I'm mistaken, we can probably just remove the IsNliadic property as an actual property (i.e. just make it return Arguments is not null). At that point, it's fine for Update() to switch to/from niladic - that's just an operation to change Arguments from null to non-null etc. And in any case argumentsPropagateNullability should correspond to arguments (both in nullness and in number of arguments when not null).

How does that sound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I especially like defining IsNiladic from arguments (and we can also annotate it so that the compiler knows about their relation).

I am not sure if it makes sense to ever switch between niladic (FOOBAR) and non-niladic (FOOBAR(), FOOBAR(1, 2, 3), ...), but I guess it should not hurt (well, as long as the caller knows what it is doing 😅 ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean, if they can use Update() to switch between 1 to two arguments, or even to zero, I don't see why they shouldn't also switch to niladic... It's on them to get it right...

Is this something you want to do as part of this PR? Or separately? If you prefer I can submit a PR for the niladic change.


return instance == Instance
&& ((arguments == Arguments) || (arguments != null && Arguments != null && arguments.SequenceEqual(Arguments)))
&& ((argumentsPropagateNullability == ArgumentsPropagateNullability)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the problem this creates is that argumentsPropagateNullability being null could mean two things: either the user is switching to niladic (and so arguments is null too), or they want to just change the arguments without changing argumentsPropagateNullability (which seems like a common scenario).

Maybe we should just have two overloads, one only for changing arguments (doesn't have argumentsPropagateNullability parameter, and importantly, cannot change the number of arguments), and another with arguments and argumentsPropagateNullability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that (it was indeed my initial approach).
Note that the "common" scenario occurs just 3 times in the codebase (there are only 4 uses of this method)

Redefined it based on `Arguments`, consistent with its annotation.
Drop all of the arguments after the first non-nullable sub-expressions.
Simplify unary `COALESCE` to its argument.
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ranma42!

@roji roji merged commit 798fbc8 into dotnet:main Jul 2, 2024
7 checks passed
@ranma42 ranma42 deleted the simplify-coalesce branch July 2, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants