-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Translate COALESCE as ISNULL #34171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate COALESCE as ISNULL #34171
Conversation
At least a test for the type propagation is needed |
Another application for this is that COALESCE cannot be used for Indexed Views in SQL Server, however ISNULL is supported in case of aggregations. This makes it difficult/impossible to make EF Core generated queries use the indexed views as I suppose the query processor thinks they're different. For example:
|
@ranma42, gentle nudge on this. We like the |
@cincuranet I will rebase the PR and check the new test results ASAP (worst case: next weekend) Last time I kind of got stuck because of some typing issues as mentioned in #34171 (comment) I will try to point out explicitly the issues that must be solved (ideally as questions) to unblock this |
public virtual Task Coalesce_Correct_Type(bool async) | ||
=> AssertQuery( | ||
async, | ||
ss => ss.Set<Customer>().Select(c => c.Region ?? "no region specified")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is designed to ensure that EFCore takes care of the difference in type propagation between ISNULL
and COALESCE
.
From the docs:
Data type determination of the resulting expression is different. ISNULL uses the data type of the first parameter, COALESCE follows the CASE expression rules and returns the data type of value with the highest precedence.
Specifically, in this case c.Region
is an nvarchar(15)
, while "no region specified"
cannot be represented as a value of that type (it's longer than 15).
The implementation I did in this PR would take care of this, under the assumption that the typeMapping
for CONVERT
can be used to determine the type (and/or whether any casting/conversion is needed), but unfortunately EFCore currently computes it as nvarchar(15)
even though (I believe) its result is actually an nvarchar(18)
.
See https://dbfiddle.uk/ITGS6ZVJ
DECLARE @what sql_variant;
DECLARE @foo varchar(15) = NULL;
SELECT @what = ISNULL(@foo, 'no region selected');
SELECT
@what,
SQL_VARIANT_PROPERTY(@what, 'BaseType'),
SQL_VARIANT_PROPERTY(@what, 'MaxLength');
SELECT @what = COALESCE(@foo, 'no region selected');
SELECT
@what,
SQL_VARIANT_PROPERTY(@what, 'BaseType'),
SQL_VARIANT_PROPERTY(@what, 'MaxLength');
What is the best way forward for the type issue? I see that the type inference already handles string concatenation in a special way, but it looks like nothing at all happens for other values/operations ( NB: I am not sure I actually grasp what typeMappings are supposed to represent (hence how they should be computed/used), so maybe I am simply misusing it in this PR 😇 |
Are you talking about the
Likely just SQL Server, given the
It should result in double, I believe. |
Yes. More in general, I am referring to the mismatch between the
I am unsure whether the expression/comparison behavior is intentional (it is not mentioned as a limitation of value conversion); I'll prepare an issue to showcase the behavior I am seeing and understand if I am simply misreading the docs or if the query conversion has some problems in this corner case. In this case I am not looking for a representation of the C#/SQL conversion, but rather the (SQL) type of the efcore/src/EFCore.Relational/Query/SqlExpressionFactory.cs Lines 252 to 265 in ac5bd7b
If
👍 other providers might want to use a similar type propagation mechanism, but as long as it is only used in the SqlServer provider, we can look for a reasonable API/implementation in there; I'll move along this direction
This would be my expectation as well; I'll double check (actually, I'll add a test), but IIRC it currently results in |
In most cases this is not actually a problem, as the compiler inserts casts to ensure that the type of |
@cincuranet I rebased the PR and limited the scope to a special case that I believe might still be interesting, i.e.:
This ensures that the conversion avoids some issues related to #15586 |
I might need some help for Cosmos 😇 |
@ranma42 Here's what's failing. The first two seem to be easy renames. Third one is new test, below is what was produced, validate that and change
|
`COALESCE`is a syntactic shortcut for the CASE expression. As such, the input values are evaluated multiple times. `ISNULL` does not have this shortcoming. Fixes dotnet#32519.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 21 out of 21 changed files in this pull request and generated no comments.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindMiscellaneousQuerySqlServerTest.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the post-merge review... It's really nice to see this merged - should provide a nice perf improvement when coalescing heavy expressions, and provide sane behavior when coalescing non-deterministic ones.
See below for some minor thoughts that came up (nothing crucial).
// with the two above, this implies that all of the expressions | ||
// are using the default type mapping of the type) | ||
if (defaultTypeMapping == typeMapping | ||
&& sqlFunctionExpression.Arguments.All(a => a.Type == type && a.TypeMapping == typeMapping)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the above comment on the CLR type, I think we could compare the store type instead of referencing-comparing the type mapping instances... There's no guarantee in EF that there's only one type mapping instance for a given type mapping, and unless I'm mistaken the only thing that matters here for ISNULL is the actual store type. So comparing the store type should be safe and possibly allow for some more scenarios.
&& sqlFunctionExpression.Arguments.All(a => a.Type == type && a.TypeMapping == typeMapping)) { | |
&& sqlFunctionExpression.Arguments.All(a => a.Type == type && a.TypeMapping?.StoreType == typeMapping?.StoreType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this would not recognize cases in which the mismatch is in one of the parameters (such as varchar
vs varchar(15)
).
Maybe the right approach here would be to make typemappings IEquatable
and compare them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this would not recognize cases in which the mismatch is in one of the parameters (such as varchar vs varchar(15)).
I think StoreType must always be the full store type - including any facets. If that's not the case, we most probably have bugs somewhere else as well. Do you have a specific case in mind where this doesn't work?
Maybe the right approach here would be to make typemappings IEquatable and compare them
I'm not sure... The point here is that in this specific context, the only thing we care about is the actual type in the database; in other words, various other details that distinguish type mappings from one another are irrelevant (value comparers, value converters...), since they affect EF client-side processing only. Assuming we have a 100% unambiguous representation of the database type - which is what StoreType is supposed to be - comparing those should be sufficient and allow for maximum coverage of this optimization, I think.
A general IEquatable would need to also compare e.g. value comparers (since it's possible that in other contexts that's relevant), and so would needlessly exclude some valid cases...
.Aggregate(head, (l, r) => new SqlFunctionExpression( | ||
"ISNULL", | ||
arguments: [l, r], | ||
nullable: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for the top-most ISNULL, shouldn't we be taking the original coalesce expression's nullability? For example, in the very common case of b.Foo ?? "<unknown>"
, the entire expression should be non-nullable since the last sub-expression (the constant string) is non-nullable.
As this is done extremely late in the query pipeline (SQL generator), there's not going to be anything afterwards that cares about this (this node is immediately processed, generating SQL string data, and then discarded). But for correctness's sake (and in case this code is copy-pasted somewhere else), maybe we should do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullable: true
is always safer than nullable: false
, so no risk of incorrectness here ;)
We could do this for the top-most ISNULL
invocation (which, if the COALESCE
has been optimized properly, is the only one that can be non-nullable), but it would make the code a little more complicated for no apparent advantage.
Note that in this context we do not have access to the results of the nullability processor (the nullable
value in the expression does not correspond to the results of the processing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no risk of incorrectness here
Right, not incorrect in the sense of incorrect results - just potentially SQL that's more convoluted (and less efficient) than it needs to be.
Note that in this context we do not have access to the results of the nullability processor (the nullable value in the expression does not correspond to the results of the processing).
That's true, but the expression nullable
does represent what it represents: it seems odd to basically lose that setting just because we're transforming one SQL function into another here.
But anyway, given where this code is, I agree it's theoretical. If we do any extra work on this, I'd add a comment making that clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no risk of incorrectness here
Right, not incorrect in the sense of incorrect results - just potentially SQL that's more convoluted (and less efficient) than it needs to be.
Note that in this context we do not have access to the results of the nullability processor (the nullable value in the expression does not correspond to the results of the processing).
That's true, but the expression
nullable
does represent what it represents: it seems odd to basically lose that setting just because we're transforming one SQL function into another here.
I believe we are not losing any information here: COALESCE
expressions are always constructed with nullable: true
from this code.
They can be re-written in a few places but the IsNullable
property is never modified (neither for COALESCE
nor for other function expressions).
But anyway, given where this code is, I agree it's theoretical. If we do any extra work on this, I'd add a comment making that clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I wasn't aware of that... That's presumably because we have special handling of COALESCE in any case in SqlNullabilityProcessor, so IsNullable is maybe effectively unused?
I'd prefer it IsNullable were accurate here (across the board, regardless of COALESCE vs. ISNULL) - you never know when someone might look at it - but I guess it's not super important at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I wasn't aware of that... That's presumably because we have special handling of COALESCE in any case in SqlNullabilityProcessor, so IsNullable is maybe effectively unused?
The nullability processor relies on IsNullable
as input to decide whether to assume that the function is guaranteed to return a non-null value or whether its result should be considered (non)nullable depending on (some of) its arguments.
Currently it is accurate in the perspective of function definition and not from the point of view of the expression (even less of the expression in a given context).
IIRC the same holds true for other expressions as well: during the nullability computation for myNullableColumn is null ? 1 : myNullableColumn + 2
the second myNullableColumn
expression is evaluated as non-nullable, but it is not rewritten to have IsNullable = false
.
I'd prefer it IsNullable were accurate here (across the board, regardless of COALESCE vs. ISNULL) - you never know when someone might look at it - but I guess it's not super important at this point.
Your suggestion goes in the direction of tracking nullability in the expression, which might indeed prove very useful, for example when lowering SqlServer boolean expressions/predicates 😫, but I think is not specifically related to the change COALESCE->ISNULL in this PR
// stay on the safe side we only use ISNULL if: | ||
// - all sub-expressions have the same type as the expression | ||
// - all sub-expressions have the same type mapping as the expression | ||
// - the expression is using the default type mapping (combined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one also seems a bit strict - if all sub-expressions have the same store type, shouldn't everything be fine and we can assume the overall expression has the same type? What's the scenario that could be problematic given a non-default store type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EFCore computes accurate store types only in some cases (column expressions, simple additions), but in many other cases it approximates them, for example a literal string is modeled as varchar
instead of varchar(length-of-the-literal)
.
The idea was that if all of the the expressions are using the default type mapping, the C#-oriented type compatibility matches the SQL one as closely as possible... but now I wonder if some cases are actually unsafe even under this assumption 🤔 .
I think an example that would break with only same type and same type-mapping is:
(col1VarChar10 != 'foo' ? col1VarChar10 : 'a long literal') ??
(col2VarChar10 != 'foo' ? col2VarChar10 : 'a very long literal')
IIRC EFCore computes the type mapping of the LHS (and of the whole expression) as the same type mapping as col1VarChar10
.
I think an example that would break with the current set of constraints is :
(col1VarChar10 == 'foo' ? 'a long literal' : col1VarChar10) ??
(col2VarChar10 == 'foo' ? 'a very long literal' : col2VarChar10)
I will try to check this ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the general logic is that type mappings are configured on columns, and then inferred from them to other, non-column expression types (e.g. in b.MyCol == "some constant"
), as well as bubbling up in the tree as necessary.
But I'm still a bit confused: if we know that all sub-expressions have the same store type (and therefore so should the SqlFunctionExpression for the COALESCE call on top), how could we possibly have an issue, regardless of whether that type mapping happens to be a default or not? Is this because you think that with a default type mapping there's less of chance that EF got the type mapping wrong (or similar)? Would be good to see a concrete case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coalesce_Correct_TypeMapping_Double
and Coalesce_Correct_TypeMapping_String
are a couple of examples in which EFCore computes the wrong (/inaccurate) type mapping for some of the expressions.
Assuming that the implementation in this PR is wrong, I will add another test case that shows that the check is not strict enough.
Conversely, you are right that checking that the type is the default one is not required; its main purpose was to trust C# for the type computation, under the (presumably wrong) assumption that when the C#-computed type and the EFCore-computed type matched, that also meant a match for the SQL type.
This test passes with the current code but would fail if the check for the default type(mapping) was removed: [ConditionalTheory]
[MemberData(nameof(IsAsyncData))]
public virtual Task Coalesce_Correct_TypeMapping_String_Expression(bool async)
=> AssertQuery(
async,
ss => ss.Set<Customer>().OrderBy(c => c.CustomerID).Select(c => (c.Region == null ? null : "R" + c.Region) ?? "no region specified"),
assertOrder: true); I am now pretty confident I can make the current translation fail with a different test (still working on it, though). (also, the EDIT: The |
Yes, that's true.. I don't think we need to take special account of this for this particular case. Thanks for doing the extra checks around the default type mapping... |
COALESCE
is a syntactic shortcut for the CASE expression. As such, the input values are evaluated multiple times.ISNULL
does not have this shortcoming.Fixes #32519.