Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add map_extract module and function #11969

Merged
merged 27 commits into from
Aug 17, 2024

Conversation

Weijun-H
Copy link
Member

Which issue does this PR close?

Closes #11935

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Aug 13, 2024
@Weijun-H
Copy link
Member Author

support related array tests after #11712

@@ -146,6 +146,8 @@ pub enum ArrayFunctionSignature {
/// Specialized Signature for MapArray
/// The function takes a single argument that must be a MapArray
MapArray,
/// Specialized Signature for MapExtract and similar functions
MapArrayAndKey,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use TypeSignature::UserDefined to avoid enumeration bloat and make it more cohesive?

Copy link
Contributor

@jayzhan211 jayzhan211 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if there are more than 1 functions utilize the same signature, we could add it

@dharanad
Copy link
Contributor

I plan to review this tomorrow

@github-actions github-actions bot added documentation Improvements or additions to documentation and removed logical-expr Logical plan and expressions labels Aug 14, 2024
@Weijun-H Weijun-H requested a review from jonahgao August 14, 2024 07:26
datafusion/functions-nested/src/map_extract.rs Outdated Show resolved Hide resolved
datafusion/common/src/utils/mod.rs Outdated Show resolved Hide resolved
datafusion/common/src/utils/mod.rs Outdated Show resolved Hide resolved
datafusion/functions-nested/src/map_extract.rs Outdated Show resolved Hide resolved
datafusion/functions-nested/src/map_extract.rs Outdated Show resolved Hide resolved
@Weijun-H Weijun-H requested a review from dharanad August 14, 2024 15:10
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Weijun-H

I noticed that #11935 talks about element_at but this PR proposes adding map_extract

Is map_extract modeled on another Database?

It seems like map_extact is meant to find the key with the corresponding value?

datafusion/sqllogictest/test_files/map.slt Show resolved Hide resolved
@@ -3700,6 +3701,36 @@ SELECT MAKE_MAP('POST', 41, 'HEAD', 33, 'PATCH', null);
{POST: 41, HEAD: 33, PATCH: }
```

### `map_extract`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@Weijun-H
Copy link
Member Author

Is map_extract modeled on another Database?

It seems like map_extact is meant to find the key with the corresponding value?

The clickhouse provides a similar function mapExtractKeyLike , and DuckDB has the same function to get the value with the corresponding key. @alamb

@Weijun-H Weijun-H force-pushed the 11935-Support-element_at-for-Map-type branch from e0bcabb to 5e978b4 Compare August 15, 2024 09:46
Comment on lines 543 to 571
# map_extract with columns
query ???
select map_extract(column1, 1), map_extract(column1, 4), map_extract(column1, 5) from map_array_table_1;
----
[1, , 3] NULL NULL
NULL [1, , 3] [4, , 6]
NULL NULL NULL

query ???
select map_extract(column1, column2), map_extract(column1, column3), map_extract(column1, column4) from map_array_table_1;
----
[1, , 3] [1, , 3] [1, , 3]
[4, , 6] [4, , 6] [4, , 6]
NULL NULL NULL

query ???
select map_extract(column1, column2), map_extract(column1, column3), map_extract(column1, column4) from map_array_table_2;
----
[1, , 3] NULL [1, , 3]
[4, , 6] NULL [4, , 6]
NULL NULL NULL

query ???
select map_extract(column1, 1), map_extract(column1, 4), map_extract(column1, 5) from map_array_table_2;
----
[1, , 3] NULL NULL
NULL [1, , 3] [4, , 6]
NULL NULL NULL

Copy link
Member Author

@Weijun-H Weijun-H Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test map_extract in a table

@alamb
Copy link
Contributor

alamb commented Aug 15, 2024

The clickhouse provides a similar function mapExtractKeyLike , and DuckDB has the same function to get the value with the corresponding key. @alamb

Thank you -- I found it now

map_extract(map, key)

Description: Alias of element_at. Return a list containing the value for a given key or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys else an error is returned.

Example

map_extract(map([100, 5], [42, 43]), 100)

Result: [42]

select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'a'), map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'b'),
map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'c'), map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'd');
----
1 NULL 3 NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my understanding of duckdb's doc the values should be a list not a scalar. In other words the result should be [1], [], [3], [] not 1, NULL, 3, NULL:

D select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'a');
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ map_extract(main.map(main.list_value('a', 'b', 'c'), main.list_value(1, NULL, 3)), 'a') │
│                                         int32[]                                         │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ [1]                                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────┘
D
D select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'd');
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ map_extract(main.map(main.list_value('a', 'b', 'c'), main.list_value(1, NULL, 3)), 'd') │
│                                         int32[]                                         │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ []                                                                                      │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHile this choice is strange to me (a list that will always have 1 or 0 elements) I think it will make for an easy implementation -- you shouldn't have to make special implementations for each type of key -- instead you should just be able to calculate the offsets directly and make a list array that points to the keys array

Copy link
Contributor

@dharanad dharanad Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Weijun-H Here is a like to old converstation about how DataFuion strives not to invent / determine semantics.
#11436 (comment)

I should have pointed it earlier, sorry about that. We should be consistent with postgres/duckdb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it unintuitive to return a list in this context because, in a Map, each key is unique by definition. Returning a list implies the possibility of multiple values for a single key, which contradicts the core concept of how a Map operates. The key uniqueness in a Map should naturally lead to a single value or NULL, making a list redundant and potentially confusing. I refactored the code so it can avoid to make special implementations for each type of key now.

Copy link
Contributor

@jayzhan211 jayzhan211 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check the history of map_extract in duckdb and find that there is no discussion about it. It is created by one man in early day
https://github.com/duckdb/duckdb/blob/a196a022ffcf427aa21184d295bb70caa42e0655/src/function/scalar/nested/map_extract.cpp

Given the rationale is not clear, I don't have strong opinion to strictly follow what duckdb is in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is created by one man

I think this describes quite a bit of duckdb 😆

@Weijun-H Weijun-H force-pushed the 11935-Support-element_at-for-Map-type branch from 594f463 to c5b23ec Compare August 16, 2024 02:51
@Weijun-H
Copy link
Member Author

Changed to follow DuckDB now

let query_key = query_keys_array.slice(row_index, 1);

let value_index = (0..len)
.find(|&i| keys.slice(start + i, 1) == Arc::<dyn Array>::clone(&query_key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I dont think we need a clone for every iteration. How about creating a clone outside of this loop.

return exec_err!("map_extract expects two arguments");
}
let map_type = &arg_types[0];
let map_fields = get_map_entry_field(map_type)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it is better to return data type directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value type is not List, so we cannot return it directly.

datafusion/functions-nested/src/map_extract.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments left, but it is also good to go

@jayzhan211 jayzhan211 merged commit 72b6a49 into apache:main Aug 17, 2024
25 checks passed
@jayzhan211
Copy link
Contributor

Thanks All

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support element_at for Map type
5 participants