Implement nullability computation for `SqlFunctionExpression` #33814

ranma42 · 2024-05-25T20:11:28Z

The current implementation of SqlNullabilityProcessor.VisitSqlFunction() does not take into account the nullability of the function arguments (or the target instance) nor whether they propagate nulls to the function result.

This causes several additional null checks, which are not actually needed.

The computation is modeled after ProcessNullNotNull, specifically:

efcore/src/EFCore.Relational/Query/SqlNullabilityProcessor.cs

Lines 2280 to 2330 in 48eaf72

    
                           if (!sqlFunctionExpression.IsNullable) 
        
                           { 
        
                               // when we know that function can't be nullable: 
        
                               // non_nullable_function() is null-> false 
        
                               // non_nullable_function() is not null -> true 
        
                               return _sqlExpressionFactory.Constant( 
        
                                   sqlUnaryExpression.OperatorType == ExpressionType.NotEqual, 
        
                                   sqlUnaryExpression.TypeMapping); 
        
                           } 
        
                           // see if we can derive function nullability from it's instance and/or arguments 
        
                           // rather than evaluating nullability of the entire function 
        
                           var nullabilityPropagationElements = new List<SqlExpression>(); 
        
                           if (sqlFunctionExpression is { Instance: not null, InstancePropagatesNullability: true }) 
        
                           { 
        
                               nullabilityPropagationElements.Add(sqlFunctionExpression.Instance); 
        
                           } 
        
                           if (!sqlFunctionExpression.IsNiladic) 
        
                           { 
        
                               for (var i = 0; i < sqlFunctionExpression.Arguments.Count; i++) 
        
                               { 
        
                                   if (sqlFunctionExpression.ArgumentsPropagateNullability[i]) 
        
                                   { 
        
                                       nullabilityPropagationElements.Add(sqlFunctionExpression.Arguments[i]); 
        
                                   } 
        
                               } 
        
                           } 
        
                           // function(a, b) IS NULL -> a IS NULL || b IS NULL 
        
                           // function(a, b) IS NOT NULL -> a IS NOT NULL && b IS NOT NULL 
        
                           if (nullabilityPropagationElements.Count > 0) 
        
                           { 
        
                               var result = nullabilityPropagationElements 
        
                                   .Select( 
        
                                       e => ProcessNullNotNull( 
        
                                           _sqlExpressionFactory.MakeUnary( 
        
                                               sqlUnaryExpression.OperatorType, 
        
                                               e, 
        
                                               sqlUnaryExpression.Type, 
        
                                               sqlUnaryExpression.TypeMapping)!, 
        
                                           operandNullable)) 
        
                                   .Aggregate( 
        
                                       (r, e) => SimplifyLogicalSqlBinaryExpression( 
        
                                           sqlUnaryExpression.OperatorType == ExpressionType.Equal 
        
                                               ? _sqlExpressionFactory.OrElse(r, e) 
        
                                               : _sqlExpressionFactory.AndAlso(r, e))); 
        
                               return result; 
        
                           } 
        
                       }

maumar · 2024-05-28T23:31:39Z

@ranma42 I really like this improvement. However, there is a slight change in the semantics of nullability propagation. The original intent for this feature was to improve null check sql (IS NULL / IS NOT NULL), where we can. Here is the original issue: #18555 and pr: #19607

The idea is that argument propagates nullability should be set to true iff (if and only if) the function can result in null based on it's argument being null (which is vast majority of sql functions). However, there is a possibility that a function can be null even if all of the arguments are not null - a good example are some of the JSON functions like JSON_VALUE. Also, users can create their own custom functions that have this property. In those cases function propagates nullability should be set to false for all the arguments, which means that we can't take any shortcuts.
This was originally improvement for our own functions and only later extended to user created functions, but that idea wasn't well documented/communicated.

If I read the code correctly, the new implementation always takes the argument propagates nullability value into account, so a function which set them all to false will be marked as non-nullable, which is not always correct. Instead, we should check if at least one of the argument propagates nullability, or instance propagates nullability is set to true, and only if that's the case, apply the logic that was added. Otherwise we can't rely on the optimization and we should just use function.IsNullable. See how things are done in ProcessNullNotNull

In hindsight, what we probably should have done is to add a property like useNullPropagationInformation, which would allow us to document/communicate this properly to users, rather than rely on this obscure pattern. Alas, to late to change now.

ranma42 · 2024-05-29T04:33:42Z

@ranma42 I really like this improvement. However, there is a slight change in the semantics of nullability propagation. The original intent for this feature was to improve null check sql (IS NULL / IS NOT NULL), where we can. Here is the original issue: #18555 and pr: #19607

The idea is that argument propagates nullability should be set to true iff (if and only if) the function can result in null based on it's argument being null (which is vast majority of sql functions). However, there is a possibility that a function can be null even if all of the arguments are not null - a good example are some of the JSON functions like JSON_VALUE. Also, users can create their own custom functions that have this property. In those cases function propagates nullability should be set to false for all the arguments, which means that we can't take any shortcuts. This was originally improvement for our own functions and only later extended to user created functions, but that idea wasn't well documented/communicated.

Sorry, I was not aware of this and I ended up replicating only a part of the ProcessNullNotNull.
After learning about this, I managed to find the note here:

This optimization should only be used if the function can only return null when it's parameters are null.

I guess that the trick in this case is that Microsoft.EntityFrameworkCore.Metadata.Internal.DbFunctionParameter._propagatesNullability starts as false, so if the user never invokes PropagatesNullability() all of them end up being false, hence setting up the state you described.

Note that this behavior is not explained in the documentation of SqlFunctionExpression.
Would it make sense to open a (separate) PR to explain the behavior of the nullability propagation arguments?

If I read the code correctly, the new implementation always takes the argument propagates nullability value into account, so a function which set them all to false will be marked as non-nullable, which is not always correct. Instead, we should check if at least one of the argument propagates nullability, or instance propagates nullability is set to true, and only if that's the case, apply the logic that was added. Otherwise we can't rely on the optimization and we should just use function.IsNullable. See how things are done in ProcessNullNotNull

Ah, I see, that's the purpose of the if (nullabilityPropagationElements.Count > 0)

This means that the semantics changes radically. IIUC the semantics are:

if no argument (nor the instance) is set to propagate nullability, just rely on the IsNullable flag
otherwise, compute nullability mostly as I currently did

I will update the PR shortly to match this.

In hindsight, what we probably should have done is to add a property like useNullPropagationInformation, which would allow us to document/communicate this properly to users, rather than rely on this obscure pattern. Alas, to late to change now.

If this interface was to be redesigned, there are also other cases where it might be interesting to have a richer semantics (for example a UDF with COALESCE-like behavior).

AFAICT it might even be possible to extend this part of the SqlFunctionExpression in the future:

extending the public constructors should trivially be backward-compatible
making the semantics richer by adding more complex nullability information might require changes to providers/libraries that want to take advantage of that; as long as the members that are currently available are present, it should not break (and ideally not even regress) the consumers of the API

While this might be an interesting research direction (it would ideally remove the special cases for COALESCE etc... eventually it might even make it possible to unify the nullability computation for all operations, by simply interpreting them as "function" with appropriate arguments and nullability propagation settings), I believe that such an endeavor would require a thorough design.

maumar · 2024-05-29T06:29:00Z

Note that this behavior is not explained in the documentation of SqlFunctionExpression. Would it make sense to open a (separate) PR to explain the behavior of the nullability propagation arguments?

Good idea, filed #33833 to track this.

This means that the semantics changes radically. IIUC the semantics are:

if no argument (nor the instance) is set to propagate nullability, just rely on the IsNullable flag

otherwise, compute nullability mostly as I currently did

That's exactly the intended logic. Note also that it gives an uncertain user "a way out" - if all of them are set to false we just take the nullability of SqlFunction, we may produce less efficient sql but it will always be correct. Wheras in the previous design user had to correctly specify all the nullability propagation elements or risk data corruption in some cases.

If this interface was to be redesigned, there are also other cases where it might be interesting to have a richer semantics (for example a UDF with COALESCE-like behavior).

AFAICT it might even be possible to extend this part of the SqlFunctionExpression in the future:

extending the public constructors should trivially be backward-compatible

making the semantics richer by adding more complex nullability information might require changes to providers/libraries that want to take advantage of that; as long as the members that are currently available are present, it should not break (and ideally not even regress) the consumers of the API

While this might be an interesting research direction (it would ideally remove the special cases for COALESCE etc... eventually it might even make it possible to unify the nullability computation for all operations, by simply interpreting them as "function" with appropriate arguments and nullability propagation settings), I believe that such an endeavor would require a thorough design.

Yeah, I think it's currently not worth the effort needed to get this right. Current design should be ok, especially once we document it bit better

ranma42 · 2024-05-29T06:35:23Z

I pushed an updated branch and kept the original one for ease of comparison.
As shown in https://github.com/ranma42/efcore/compare/compute-func-nullability-bad..compute-func-nullability the main differences in tests are:

MAX is now correctly treated as nullable even when aggregating over non-nullable expressions 👍 (NULL happens when the collection is empty)
Compare_function_without_null_propagation_to_null now retains the original NULL check

I had to cherry-pick ranma42@1ff7042 from #33814 , otherwise the nullability computation would fail upon the out-of-range access.

ranma42 · 2024-05-29T18:05:06Z

rebased on top of the current main, to include the nullability arity fixes (which are required to avoid failing the evaluation of nullabilities, specifically for abs and rtrim)

maumar · 2024-05-29T19:58:43Z

another great contribution, thanks! @ranma42

ranma42 mentioned this pull request May 25, 2024

Fix comparison of nullable values #33757

Merged

ranma42 changed the title ~~Implement nullability computation for SqlFunction~~ Implement nullability computation for SqlFunctionExpression May 25, 2024

This was referenced May 26, 2024

Check arity consistency when constructing SqlFunctionExpressions #33815

Merged

Translation of impure functions can result in wrong queries #33791

Open

ajcvickers assigned maumar May 28, 2024

maumar mentioned this pull request May 29, 2024

Query: add better documentation explaining how to use instancePropagatesNullability and argumentsPropagateNullability on SqlFunctionExpression #33833

Open

ranma42 force-pushed the compute-func-nullability branch from 5474977 to 9e0f48e Compare May 29, 2024 06:31

ranma42 added 2 commits May 29, 2024 19:50

Compute nullability in VisitSqlFunction

0fb62aa

Update tests

f24ca35

ranma42 force-pushed the compute-func-nullability branch from 9e0f48e to f24ca35 Compare May 29, 2024 17:50

maumar approved these changes May 29, 2024

View reviewed changes

maumar merged commit a4dbfb5 into dotnet:main May 29, 2024
7 checks passed

ranma42 deleted the compute-func-nullability branch May 29, 2024 22:37

ajcvickers added the community-contribution label Jun 5, 2024

roji mentioned this pull request Jul 3, 2024

[9.0] New translations and SQL optimization/simplification #34151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement nullability computation for `SqlFunctionExpression` #33814

Implement nullability computation for `SqlFunctionExpression` #33814

ranma42 commented May 25, 2024 •

edited

Loading

maumar commented May 28, 2024

ranma42 commented May 29, 2024

maumar commented May 29, 2024

ranma42 commented May 29, 2024 •

edited

Loading

ranma42 commented May 29, 2024

maumar commented May 29, 2024

	if (!sqlFunctionExpression.IsNullable)
	{
	// when we know that function can't be nullable:
	// non_nullable_function() is null-> false
	// non_nullable_function() is not null -> true
	return _sqlExpressionFactory.Constant(
	sqlUnaryExpression.OperatorType == ExpressionType.NotEqual,
	sqlUnaryExpression.TypeMapping);
	}

	// see if we can derive function nullability from it's instance and/or arguments
	// rather than evaluating nullability of the entire function
	var nullabilityPropagationElements = new List<SqlExpression>();
	if (sqlFunctionExpression is { Instance: not null, InstancePropagatesNullability: true })
	{
	nullabilityPropagationElements.Add(sqlFunctionExpression.Instance);
	}

	if (!sqlFunctionExpression.IsNiladic)
	{
	for (var i = 0; i < sqlFunctionExpression.Arguments.Count; i++)
	{
	if (sqlFunctionExpression.ArgumentsPropagateNullability[i])
	{
	nullabilityPropagationElements.Add(sqlFunctionExpression.Arguments[i]);
	}
	}
	}

	// function(a, b) IS NULL -> a IS NULL \|\| b IS NULL
	// function(a, b) IS NOT NULL -> a IS NOT NULL && b IS NOT NULL
	if (nullabilityPropagationElements.Count > 0)
	{
	var result = nullabilityPropagationElements
	.Select(
	e => ProcessNullNotNull(
	_sqlExpressionFactory.MakeUnary(
	sqlUnaryExpression.OperatorType,
	e,
	sqlUnaryExpression.Type,
	sqlUnaryExpression.TypeMapping)!,
	operandNullable))
	.Aggregate(
	(r, e) => SimplifyLogicalSqlBinaryExpression(
	sqlUnaryExpression.OperatorType == ExpressionType.Equal
	? _sqlExpressionFactory.OrElse(r, e)
	: _sqlExpressionFactory.AndAlso(r, e)));

	return result;
	}
	}

Implement nullability computation for SqlFunctionExpression #33814

Implement nullability computation for SqlFunctionExpression #33814

Conversation

ranma42 commented May 25, 2024 • edited Loading

maumar commented May 28, 2024

ranma42 commented May 29, 2024

maumar commented May 29, 2024

ranma42 commented May 29, 2024 • edited Loading

ranma42 commented May 29, 2024

maumar commented May 29, 2024

Implement nullability computation for `SqlFunctionExpression` #33814

Implement nullability computation for `SqlFunctionExpression` #33814

ranma42 commented May 25, 2024 •

edited

Loading

ranma42 commented May 29, 2024 •

edited

Loading