Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast Utf8 -> Utf8View (not the other way around) for binary operators #11881

Closed
Tracked by #11752
alamb opened this issue Aug 7, 2024 · 3 comments
Closed
Tracked by #11752

Cast Utf8 -> Utf8View (not the other way around) for binary operators #11881

alamb opened this issue Aug 7, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 7, 2024

Is your feature request related to a problem or challenge?

In #11796 @dharanad added a rule for binary operators such that if Utf8View is in any side, we coerce to Utf8.

I think it would be better to coerce to Utf8View as that coercsion will often be faster (it is faster to cast Utf8 -> Utf8View than the other way around)

@XiangpengHao notes: #11796 (comment)

Agree, similar to this policy:

fn string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;
match (lhs_type, rhs_type) {
// If Utf8View is in any side, we coerce to Utf8View.
(Utf8View, Utf8View | Utf8 | LargeUtf8) | (Utf8 | LargeUtf8, Utf8View) => {
Some(Utf8View)
}
// Then, if LargeUtf8 is in any side, we coerce to LargeUtf8.
(LargeUtf8, Utf8 | LargeUtf8) | (Utf8, LargeUtf8) => Some(LargeUtf8),
(Utf8, Utf8) => Some(Utf8),
_ => None,
}
}

Describe the solution you'd like

Cast to Utf8View rather than Utf8 in the aforementioned code

Describe alternatives you've considered

No response

Additional context

No response

@dharanad
Copy link
Contributor

dharanad commented Aug 8, 2024

take

@dharanad
Copy link
Contributor

I think we can close this issue

@alamb
Copy link
Contributor Author

alamb commented Aug 25, 2024

Thank you @dharanad - I just doubled checked an indeed this is fixed now:

> create table foo as values (arrow_cast('foo', 'Utf8View'));
0 row(s) fetched.
Elapsed 0.014 seconds.

> explain select column1 || 'bar' from foo;
+---------------+--------------------------------------------------------------------------+
| plan_type     | plan                                                                     |
+---------------+--------------------------------------------------------------------------+
| logical_plan  | Projection: foo.column1 || Utf8View("bar") AS foo.column1 || Utf8("bar") |
|               |   TableScan: foo projection=[column1]                                    |
| physical_plan | ProjectionExec: expr=[column1@0 || bar as foo.column1 || Utf8("bar")]    |
|               |   MemoryExec: partitions=1, partition_sizes=[1]                          |
|               |                                                                          |
+---------------+--------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.006 seconds.

> explain select 'bar' || column1 from foo;
+---------------+--------------------------------------------------------------------------+
| plan_type     | plan                                                                     |
+---------------+--------------------------------------------------------------------------+
| logical_plan  | Projection: Utf8View("bar") || foo.column1 AS Utf8("bar") || foo.column1 |
|               |   TableScan: foo projection=[column1]                                    |
| physical_plan | ProjectionExec: expr=[bar || column1@0 as Utf8("bar") || foo.column1]    |
|               |   MemoryExec: partitions=1, partition_sizes=[1]                          |
|               |                                                                          |
+---------------+--------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.001 seconds.

>

@alamb alamb closed this as completed Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants