Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve trim for string view #12395

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

Rachelint
Copy link
Contributor

@Rachelint Rachelint commented Sep 9, 2024

Which issue does this PR close?

Closes #12387

Rationale for this change

Similar as the string view version substr, we can impl the string view version trim to improve performance.

What changes are included in this PR?

  • Impl a string view version trim which can avoid copying the whole long(> 12) string when performing trim.
  • Introduce the basic unit tests for trim.

Are these changes tested?

Test by new unit test and exist other tests.

Are there any user-facing changes?

No.

@Kev1n8
Copy link
Contributor

Kev1n8 commented Sep 9, 2024

FYI @Rachelint that #12383 is modifying make_and_append_view, the original implementation is not correct, which is my fault.

@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 9, 2024

FYI @Rachelint that #12383 is modifying make_and_append_view, the original implementation is not correct, which is my fault.

Thanks! I will push forward this until #12383 merged.

@Rachelint Rachelint marked this pull request as ready for review September 11, 2024 13:13
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Rachelint -- this looks really nice and quite close 🙏

I left some comments, but I don't think they are required to merge this.

I do think we should have benchmark numbers showing this makes things faster in order to merge it. Could you please make a StringView based benchmark for trim -- perhaps in

// regarding copyright ownership. The ASF licenses this file
?

Then we can run that benchmark and show that this PR improves the performance.

Thanks again!

@@ -82,7 +82,11 @@ impl ScalarUDFImpl for BTrimFunc {
}

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
utf8_to_str_type(&arg_types[0], "btrim")
if arg_types[0] == DataType::Utf8View {
Ok(DataType::Utf8View)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also eventually it would also be possible to return Utf8View when the input was Utf8 and save a copy as well

use datafusion_common::cast::{as_generic_string_array, as_string_view_array};
use datafusion_common::Result;
use datafusion_common::{exec_err, ScalarValue};
use datafusion_expr::ColumnarValue;

/// Make a `u128` based on the given substr, start(offset to view.offset), and
/// push into to the given buffers
pub(crate) fn make_and_append_view(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if we should (as a follow on PR) propose adding this upstream to arrow-rs as it seems valuable for any trim related kernels on stringview

Copy link
Contributor Author

@Rachelint Rachelint Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds great! and #12383 (comment) can be solved if it is function in arrow-rs.

@@ -81,7 +81,11 @@ impl ScalarUDFImpl for LtrimFunc {
}

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
utf8_to_str_type(&arg_types[0], "ltrim")
if arg_types[0] == DataType::Utf8View {
Ok(DataType::Utf8View)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we possibly add a .slt test to cover this (showing that the output type is now a view and some basic end to end tests (if not already done)?)

Perhaps in https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string_view.slt ?

Copy link
Contributor Author

@Rachelint Rachelint Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am fixing tests and benchmarks now.

Copy link
Contributor Author

@Rachelint Rachelint Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed test, and introduced a benchamrk in #12513 .
#12395 (comment) shows number about improvement.

The benchmark pr still need to be sorted out, I will make it later today.

@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 17, 2024

I think maybe we should place the LTrim/RTrim/BTrim into a same place(like trim.rs)?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 17, 2024
@Kev1n8
Copy link
Contributor

Kev1n8 commented Sep 17, 2024

For benchmarking, I would recommend this PR #12111. for what it's worth

@Rachelint
Copy link
Contributor Author

For benchmarking, I would recommend this PR #12111. for what it's worth

Thanks, it is really helpful!

@Rachelint
Copy link
Contributor Author

Rachelint commented Sep 17, 2024

Run benchmark introduced in #12513, about 10~20% improvement for the long string(64 bytes).

Highlights, as we expected, the string view trim mainly reduces copyings when the trimmed result > 12:

group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.16     41.2±0.19µs        ? ?/sec    1.00     35.6±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.25    173.1±5.68µs        ? ?/sec    1.00    138.5±0.78µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.24    341.3±3.67µs        ? ?/sec    1.00    276.1±1.17µs        ? ?/sec

The detailed sorted out benchmark result:

group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN <= 12/large_string [size=1024, len_before=12, len_after=8]                     1.00     35.9±0.07µs        ? ?/sec    1.01     36.1±0.36µs        ? ?/sec
INPUT LEN <= 12/large_string [size=4096, len_before=12, len_after=8]                     1.00    139.6±0.51µs        ? ?/sec    1.00    139.1±0.49µs        ? ?/sec
INPUT LEN <= 12/large_string [size=8192, len_before=12, len_after=8]                     1.01    281.2±2.01µs        ? ?/sec    1.00    278.4±2.06µs        ? ?/sec
INPUT LEN <= 12/string [size=1024, len_before=12, len_after=8]                           1.00     35.9±0.31µs        ? ?/sec    1.00     35.9±0.14µs        ? ?/sec
INPUT LEN <= 12/string [size=4096, len_before=12, len_after=8]                           1.00    138.5±0.41µs        ? ?/sec    1.01    139.4±0.52µs        ? ?/sec
INPUT LEN <= 12/string [size=8192, len_before=12, len_after=8]                           1.00    279.1±3.72µs        ? ?/sec    1.00    278.6±1.07µs        ? ?/sec
INPUT LEN <= 12/string_view [size=1024, len_before=12, len_after=8]                      1.00     36.2±1.13µs        ? ?/sec    1.00     36.1±1.98µs        ? ?/sec
INPUT LEN <= 12/string_view [size=4096, len_before=12, len_after=8]                      1.00    139.7±1.54µs        ? ?/sec    1.00    139.0±2.41µs        ? ?/sec
INPUT LEN <= 12/string_view [size=8192, len_before=12, len_after=8]                      1.01    277.5±1.31µs        ? ?/sec    1.00    275.5±2.25µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=1024, len_before=64, len_after=4]    1.03    135.5±4.86µs        ? ?/sec    1.00    131.6±1.33µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=4096, len_before=64, len_after=4]    1.00    522.5±2.32µs        ? ?/sec    1.00    522.1±2.30µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=8192, len_before=64, len_after=4]    1.00   1039.3±3.48µs        ? ?/sec    1.00   1040.9±3.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=1024, len_before=64, len_after=4]          1.01    132.5±1.17µs        ? ?/sec    1.00    131.3±0.92µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=4096, len_before=64, len_after=4]          1.01    527.6±3.43µs        ? ?/sec    1.00    522.2±1.72µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=8192, len_before=64, len_after=4]          1.00   1043.3±2.28µs        ? ?/sec    1.00   1040.7±3.50µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=1024, len_before=64, len_after=4]     1.01    131.3±0.40µs        ? ?/sec    1.00    130.5±0.60µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=4096, len_before=64, len_after=4]     1.01    524.0±2.79µs        ? ?/sec    1.00    519.3±2.52µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=8192, len_before=64, len_after=4]     1.00   1041.1±3.21µs        ? ?/sec    1.00   1040.1±9.73µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=1024, len_before=64, len_after=60]    1.00     41.2±0.30µs        ? ?/sec    1.00     41.2±0.16µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=4096, len_before=64, len_after=60]    1.01    169.9±4.30µs        ? ?/sec    1.00    168.1±1.83µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=8192, len_before=64, len_after=60]    1.01   345.1±10.96µs        ? ?/sec    1.00    342.5±4.26µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=1024, len_before=64, len_after=60]          1.02     41.8±0.62µs        ? ?/sec    1.00     41.0±0.12µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=4096, len_before=64, len_after=60]          1.01    171.6±1.73µs        ? ?/sec    1.00    169.2±2.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=8192, len_before=64, len_after=60]          1.00    343.0±6.30µs        ? ?/sec    1.00    341.8±6.00µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.16     41.2±0.19µs        ? ?/sec    1.00     35.6±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.25    173.1±5.68µs        ? ?/sec    1.00    138.5±0.78µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.24    341.3±3.67µs        ? ?/sec    1.00    276.1±1.17µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of *TRIM functions for StringViewArray
3 participants