-
Notifications
You must be signed in to change notification settings - Fork 13
fix: nullability semantics of predicates in JoinRel and FilterRel #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
5e6bc3c
fix: summary of filter relations (nullability of expression)
mbrobbel 8d7d0f8
chore: python formatting
mbrobbel 467d8f9
fix: add literal join expression value check
mbrobbel 664e062
fix: summary improvements for nullable join predicates
mbrobbel 05ece35
fix: revert literal join expression check
mbrobbel c463c86
style: fix join type summary string style
mbrobbel 46266f0
style: fix indentation of summary strings
mbrobbel f8122b3
fix: grammar in join summary
mbrobbel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
46 changes: 46 additions & 0 deletions
46
tests/tests/relations/filter/nullability-bool-expression.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| name: filter-nullability-bool-expression | ||
| plan: | ||
| __test: [level: i] | ||
| relations: | ||
| - rel: | ||
| filter: | ||
| input: | ||
| read: | ||
| baseSchema: | ||
| names: [a, b] | ||
| struct: | ||
| nullability: NULLABILITY_REQUIRED | ||
| types: | ||
| - string: { nullability: NULLABILITY_REQUIRED } | ||
| - bool: { nullability: NULLABILITY_NULLABLE } | ||
| namedTable: | ||
| names: | ||
| - test | ||
| condition: | ||
| literal: | ||
| nullable: true | ||
| boolean: false | ||
| __test: | ||
| [ | ||
| comment: "*false or null.", | ||
| type: "NSTRUCT<a: string, b: boolean?>", | ||
| ] | ||
| - rel: | ||
| filter: | ||
| input: | ||
| read: | ||
| baseSchema: | ||
| names: [a, b] | ||
| struct: | ||
| nullability: NULLABILITY_REQUIRED | ||
| types: | ||
| - string: { nullability: NULLABILITY_REQUIRED } | ||
| - bool: { nullability: NULLABILITY_REQUIRED } | ||
| namedTable: | ||
| names: | ||
| - test | ||
| condition: | ||
| literal: | ||
| nullable: false | ||
| boolean: false | ||
| __test: [comment: "*false.", type: "NSTRUCT<a: string, b: boolean>"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix.
I'm confused by "nullable" flag here. In SQL, all types are nullable, e.g. value of any type can be null. I'm curious which systems are expected to specify nullable as false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In substrait types can be non-nullable (e.g. booleans). A scalar function from an extension could return a non-nullable boolean. This change reflects that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substrait's type system is strictly typed and actually uses non-nullable by default; please review https://substrait.io/types/type_system/. As for why, that's a question more suited for slack or maybe the mailing list I suppose. It was built this way before @mbrobbel or I joined the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrobbel @jvanstraten Thank you for explaining. This is confusing to me because most SQL functions have so-called default null behavior, e.g. null in any of the arguments automatically returns a null. Hence, most SQL functions return nullable types. Thus, it is strange to use non-nullable type by default. CC: @jacques-n
I assume I should open an issue in https://github.com/substrait-io/substrait/issues to get more context on this design decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior for Substrait functions is that the return type is nullable if and only if any of the arguments is nullable. This is covered by
MIRRORnullability behavior. With the "by default" thing I just mean that if you writei32you're actually talking about a non-nullable 32-bit integer, whereas you need to writei32?to talk about a nullable 32-bit integer. We could have usedi32!for non-nullable andi32for nullable, too (or just require either ! or ? if you want to consider nullability). It's just a notation convention thing.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the "why" part by the way, I can't speak for the community because I don't know the reasons, but as a more back-end/hardware guy, I'd say that not just allowing everything to basically fail silently by default is a good thing.
nullis hardly treated in a consistent manner, so being able to avoid it or make assertions that an expression can never fail that way is also a good thing. It's certainly true though that for a practical plan using SQL-esque functions and relations, the vast majority of your types are going to be nullable.I suppose part of it is also that we're representing a whole row/record as a single struct in some contexts. That struct is never nullable, even in SQL; there is no way to have a row that is "so null" that it doesn't even have any fields anymore. This generalization is, again, really nice when you're actually implementing these things and want to support nested types (which, AFAICT, SQL generally does not support, but Substrait does). It prevents you from having to implement column/field access and nested struct field access separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvanstraten This is very helpful clarification. Thanks.
@chaojun-zhang We need to make sure we use i32? and similar when defining custom function signatures in facebookincubator/velox#2496
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this point. In Velox, we also use RowVector to represent both a struct field and a top-level row, which like you pointed out cannot possibly be null.
Modern SQL engines (Spark, Presto, Trino, Velox at least) do support complex types and also support higher order functions / lambdas. For example, transform in Presto: https://prestodb.io/docs/current/functions/array.html#transform
I'm curious if lambda functions are supported in Substrait as well and, if so, what is the type of the "function" argument in these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be aware that it won't do anything unless you also specify
DISCRETEnullability, because the nullability flags end up getting "overridden" by the semantics ofMIRRORor (for arguments)DECLARED_OUTPUT. This and the other logic behind type derivations are, honestly, super weird, and have been causing me headaches for over half a year now to try to get them into the validator, so if something doesn't make sense to you there, it might well be because it just doesn't. I'm working on it.They're not, but only because no one has bothered to define them yet. I thought about them for a bit in substrait-io/substrait#320 because some functions naturally take a comparator lambda for sort-like semantics, which currently can't be done. I figured that, since we don't really have a concept of statements, a lambda function would just be written like a normal expression, but with argument references at the leaves rather than (only) field references and literals. The types of those would just be whatever the function you're passing the lambda to as an argument defines them to be.
This is not as powerful as generalized lambdas that you can pass around as values, though. A more general solution, with that lambda data type, would probably look something like
lambda<struct<arg0, arg1, ...>, return>, in which case the argument types would need to be defined along with the argument references in the expression tree (or, better yet, in the special expression type that constructs the lambda), and then matched against the function prototype that will be calling them (rather than derived by the prototype).