-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add map_extract module and function #11969
feat: Add map_extract module and function #11969
Conversation
support related array tests after #11712 |
@@ -146,6 +146,8 @@ pub enum ArrayFunctionSignature { | |||
/// Specialized Signature for MapArray | |||
/// The function takes a single argument that must be a MapArray | |||
MapArray, | |||
/// Specialized Signature for MapExtract and similar functions | |||
MapArrayAndKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use TypeSignature::UserDefined
to avoid enumeration bloat and make it more cohesive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if there are more than 1 functions utilize the same signature, we could add it
I plan to review this tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3700,6 +3701,36 @@ SELECT MAKE_MAP('POST', 41, 'HEAD', 33, 'PATCH', null); | |||
{POST: 41, HEAD: 33, PATCH: } | |||
``` | |||
|
|||
### `map_extract` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
The clickhouse provides a similar function |
e0bcabb
to
5e978b4
Compare
# map_extract with columns | ||
query ??? | ||
select map_extract(column1, 1), map_extract(column1, 4), map_extract(column1, 5) from map_array_table_1; | ||
---- | ||
[1, , 3] NULL NULL | ||
NULL [1, , 3] [4, , 6] | ||
NULL NULL NULL | ||
|
||
query ??? | ||
select map_extract(column1, column2), map_extract(column1, column3), map_extract(column1, column4) from map_array_table_1; | ||
---- | ||
[1, , 3] [1, , 3] [1, , 3] | ||
[4, , 6] [4, , 6] [4, , 6] | ||
NULL NULL NULL | ||
|
||
query ??? | ||
select map_extract(column1, column2), map_extract(column1, column3), map_extract(column1, column4) from map_array_table_2; | ||
---- | ||
[1, , 3] NULL [1, , 3] | ||
[4, , 6] NULL [4, , 6] | ||
NULL NULL NULL | ||
|
||
query ??? | ||
select map_extract(column1, 1), map_extract(column1, 4), map_extract(column1, 5) from map_array_table_2; | ||
---- | ||
[1, , 3] NULL NULL | ||
NULL [1, , 3] [4, , 6] | ||
NULL NULL NULL | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test map_extract
in a table
Thank you -- I found it now Description: Alias of Example map_extract(map([100, 5], [42, 43]), 100) Result: |
select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'a'), map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'b'), | ||
map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'c'), map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'd'); | ||
---- | ||
1 NULL 3 NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By my understanding of duckdb
's doc the values should be a list not a scalar. In other words the result should be [1], [], [3], []
not 1, NULL, 3, NULL
:
D select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'a');
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ map_extract(main.map(main.list_value('a', 'b', 'c'), main.list_value(1, NULL, 3)), 'a') │
│ int32[] │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ [1] │
└─────────────────────────────────────────────────────────────────────────────────────────┘
D
D select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'd');
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ map_extract(main.map(main.list_value('a', 'b', 'c'), main.list_value(1, NULL, 3)), 'd') │
│ int32[] │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ [] │
└─────────────────────────────────────────────────────────────────────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHile this choice is strange to me (a list that will always have 1 or 0 elements) I think it will make for an easy implementation -- you shouldn't have to make special implementations for each type of key -- instead you should just be able to calculate the offsets directly and make a list array that points to the keys array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Weijun-H Here is a like to old converstation about how DataFuion strives not to invent / determine semantics.
#11436 (comment)
I should have pointed it earlier, sorry about that. We should be consistent with postgres/duckdb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it unintuitive to return a list in this context because, in a Map, each key is unique by definition. Returning a list implies the possibility of multiple values for a single key, which contradicts the core concept of how a Map operates. The key uniqueness in a Map should naturally lead to a single value or NULL
, making a list redundant and potentially confusing. I refactored the code so it can avoid to make special implementations for each type of key now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I check the history of map_extract in duckdb and find that there is no discussion about it. It is created by one man in early day
https://github.com/duckdb/duckdb/blob/a196a022ffcf427aa21184d295bb70caa42e0655/src/function/scalar/nested/map_extract.cpp
Given the rationale is not clear, I don't have strong opinion to strictly follow what duckdb is in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is created by one man
I think this describes quite a bit of duckdb 😆
594f463
to
c5b23ec
Compare
Changed to follow DuckDB now |
let query_key = query_keys_array.slice(row_index, 1); | ||
|
||
let value_index = (0..len) | ||
.find(|&i| keys.slice(start + i, 1) == Arc::<dyn Array>::clone(&query_key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I dont think we need a clone for every iteration. How about creating a clone outside of this loop.
return exec_err!("map_extract expects two arguments"); | ||
} | ||
let map_type = &arg_types[0]; | ||
let map_fields = get_map_entry_field(map_type)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it is better to return data type directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value type is not List
, so we cannot return it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments left, but it is also good to go
Co-authored-by: Jay Zhan <[email protected]>
Thanks All |
Which issue does this PR close?
Closes #11935
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?