-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add bool_and and bool_or aggregate functions #314
Conversation
cc @jvanstraten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null behavior of these functions is ridiculous, but apparently SQL defines them this way. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agh, just when I click the approve button I see the file this is defined in. functions_boolean.yaml
exists. Why aren't they in there?
Ah... yes. Better to move there. |
Moved. cc @jvanstraten |
What do these functions return when all values in the input are null? |
@ianmcook good question. This is my understanding so far, as per these references 1 and 2, the latter says all the null value inputs are ignored. I belive then we have to define the default behavior to return Is there a better way to do this? |
I would have assumed true for an empty bool_and and false for an empty bool_or, but I suppose I shouldn't have been blindly applying regular boolean logic to query engines. Experimentally, because I stopped trusting documentation for null behavior a while ago: bool_and:
bool_or:
DuckDBVersion = 0.5.0 import duckdb
con = duckdb.connect(database=':memory:')
con.execute("create table tab_empty(x bool);")
con.execute("create table tab_null(x bool);")
con.execute("insert into tab_null values (null);")
con.execute("create table tab_true(x bool);")
con.execute("insert into tab_true values (true);")
con.execute("create table tab_false(x bool);")
con.execute("insert into tab_false values (false);")
con.execute("create table tab_null_true(x bool);")
con.execute("insert into tab_null_true values (null), (true);")
con.execute("create table tab_null_false(x bool);")
con.execute("insert into tab_null_false values (null), (false);")
con.execute("create table tab_true_false(x bool);")
con.execute("insert into tab_true_false values (true), (false);")
con.fetchall()
for table in ["tab_empty", "tab_null", "tab_true", "tab_false", "tab_null_true", "tab_null_false", "tab_true_false",]:
con.execute(f"select bool_and(x), bool_or(x) from {table}")
print(table, con.fetchall()) PostgreSQLVersion = 14 (using db-fiddle) create table tab_empty(x bool);
create table tab_null(x bool);
insert into tab_null values (null);
create table tab_true(x bool);
insert into tab_true values (true);
create table tab_false(x bool);
insert into tab_false values (false);
create table tab_null_true(x bool);
insert into tab_null_true values (null), (true);
create table tab_null_false(x bool);
insert into tab_null_false values (null), (false);
create table tab_true_false(x bool);
insert into tab_true_false values (true), (false); select bool_and(x), bool_or(x) from tab_empty etc. So they are, thankfully, consistent (when I first saw this I thought they wouldn't be), but nulls are indeed ignored (contrary to every other function that I've seen operate on bools), and an empty input (after ignoring nulls) is further special-cased to return null (which at least is consistent with other aggregates, just not consistent with how the rest of the world handles boolean algebra). |
extensions/functions_boolean.yaml
Outdated
- | ||
name: "bool_and" | ||
description: > | ||
Aggregate boolean values and returns a boolean value. | ||
If all of the values in the aggregation are true, function returns TRUE, | ||
else it returns FALSE if any of the returned values are false. NULL values | ||
are ignored by this function. | ||
impls: | ||
- args: | ||
- value: boolean | ||
name: x | ||
nullability: DECLARED_OUTPUT | ||
return: boolean | ||
- | ||
name: "bool_or" | ||
description: > | ||
Aggregate boolean values and returns a boolean value. | ||
If any value in the aggregation is true, function returns TRUE, | ||
else it returns FALSE if all of the returned values are false. NULL values | ||
are ignored by this function. | ||
impls: | ||
- args: | ||
- value: boolean | ||
name: x | ||
nullability: DECLARED_OUTPUT | ||
return: boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I might as well just do it like this:
- | |
name: "bool_and" | |
description: > | |
Aggregate boolean values and returns a boolean value. | |
If all of the values in the aggregation are true, function returns TRUE, | |
else it returns FALSE if any of the returned values are false. NULL values | |
are ignored by this function. | |
impls: | |
- args: | |
- value: boolean | |
name: x | |
nullability: DECLARED_OUTPUT | |
return: boolean | |
- | |
name: "bool_or" | |
description: > | |
Aggregate boolean values and returns a boolean value. | |
If any value in the aggregation is true, function returns TRUE, | |
else it returns FALSE if all of the returned values are false. NULL values | |
are ignored by this function. | |
impls: | |
- args: | |
- value: boolean | |
name: x | |
nullability: DECLARED_OUTPUT | |
return: boolean | |
- | |
name: "bool_and" | |
description: > | |
If any value in the input is false, false is returned. If the input is | |
empty or only contains nulls, null is returned. Otherwise, true is | |
returned. | |
impls: | |
- args: | |
- value: boolean | |
name: x | |
nullability: DECLARED_OUTPUT | |
return: boolean? | |
- | |
name: "bool_or" | |
description: > | |
If any value in the input is true, true is returned. If the input is | |
empty or only contains nulls, null is returned. Otherwise, false is | |
returned. | |
impls: | |
- args: | |
- value: boolean | |
name: x | |
nullability: DECLARED_OUTPUT | |
return: boolean? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed that the functions ended up in the list of scalar functions instead of aggregate functions. Looks good otherwise...
@jvanstraten I added a |
Adding the
bool_and
andbool_or
aggregation functions.