Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster strpos() string function for ASCII-only case #12401

Merged
merged 5 commits into from
Sep 17, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Sep 9, 2024

Which issue does this PR close?

Closes #12366.

Rationale for this change

For ASCII-only cases, I tried to use the slice window to find the position instead of the string pattern matching.
We can find that the ASCII-only StringViewArray case performs 50% faster than before. However, I'm not really sure why the normal string array can't be improved. 🤔

What changes are included in this PR?

If both arguments are ASCII-only, we can perform this faster method.

The benchmark result:

group                                        after                                   before
-----                                        -----                                   ------
strpos_StringArray_ascii_str_len_128         1.00  1585.2±49.78µs        ? ?/sec     1.74      2.8±0.07ms        ? ?/sec
strpos_StringArray_ascii_str_len_32          1.00   558.2±24.13µs        ? ?/sec     2.22  1240.1±51.67µs        ? ?/sec
strpos_StringArray_ascii_str_len_4096        1.00     47.3±2.79ms        ? ?/sec     1.88     89.0±1.68ms        ? ?/sec
strpos_StringArray_ascii_str_len_8           1.00    199.8±6.46µs        ? ?/sec     2.42   483.6±16.36µs        ? ?/sec
strpos_StringArray_utf8_str_len_128          1.00      4.9±0.14ms        ? ?/sec     1.00      4.9±0.12ms        ? ?/sec
strpos_StringArray_utf8_str_len_32           1.00  1852.6±47.05µs        ? ?/sec     1.01  1875.0±66.89µs        ? ?/sec
strpos_StringArray_utf8_str_len_4096         1.00    136.1±2.14ms        ? ?/sec     1.01    136.8±2.63ms        ? ?/sec
strpos_StringArray_utf8_str_len_8            1.00   831.5±43.31µs        ? ?/sec     1.00   830.9±35.23µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_128     1.00  1700.8±63.55µs        ? ?/sec     1.66      2.8±0.12ms        ? ?/sec
strpos_StringViewArray_ascii_str_len_32      1.00   626.3±27.34µs        ? ?/sec     2.02  1264.6±88.59µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_4096    1.00     47.4±1.11ms        ? ?/sec     1.88     88.9±1.96ms        ? ?/sec
strpos_StringViewArray_ascii_str_len_8       1.00   275.8±12.18µs        ? ?/sec     1.69   467.0±17.50µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_128      1.00      4.9±0.14ms        ? ?/sec     1.00      4.9±0.21ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_32       1.00  1898.4±121.22µs        ? ?/sec    1.00  1889.9±51.64µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_4096     1.00    136.5±2.14ms        ? ?/sec     1.00    136.4±1.78ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_8        1.00   848.3±35.50µs        ? ?/sec     1.02  861.6±172.97µs        ? ?/sec

Are these changes tested?

yes

Are there any user-facing changes?

@goldmedal goldmedal marked this pull request as ready for review September 10, 2024 01:52
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great 🚀 The implementation looks good to me

However, I'm not really sure why the normal string array can't be improved. 🤔

I think it's not executed and triggered some error, see benchmark StringArray cases only take several ns, and StringViewArray cases are at least several us
I guess it's because in micro bench, arguments are of different physical string type and break the function implementation (strpos(string_col, string_view_col))

With #12415 this kind of bug can be more throughly tested
Update: It's not a bug for SQL API, and this can only be triggered if we use raw expression API as in the benchmark

{
let string_iter = ArrayIter::new(string_array);
let substring_iter = ArrayIter::new(substring_array);
let ascii_only = string_array.is_ascii() && substring_array.is_ascii();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let ascii_only = string_array.is_ascii() && substring_array.is_ascii();
let ascii_only = substring_array.is_ascii() && string_array.is_ascii() ;

I think we can check substring first, since it's cheaper in common cases and we can then shortcircuit checking string_array in some case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion! I did the benchmark again. The performance is improved!

group                                        before                                  after
-----                                        -----                                  -------
strpos_StringArray_ascii_str_len_128         1.06    370.5±9.46ns        ? ?/sec    1.00   348.6±11.38ns        ? ?/sec
strpos_StringArray_ascii_str_len_32          1.07   372.5±10.23ns        ? ?/sec    1.00    346.9±9.45ns        ? ?/sec
strpos_StringArray_ascii_str_len_4096        1.08   378.4±12.05ns        ? ?/sec    1.00   349.4±16.83ns        ? ?/sec
strpos_StringArray_ascii_str_len_8           1.07   371.4±14.53ns        ? ?/sec    1.00   346.1±28.44ns        ? ?/sec
strpos_StringArray_utf8_str_len_128          1.06   377.2±18.04ns        ? ?/sec    1.00   356.4±21.92ns        ? ?/sec
strpos_StringArray_utf8_str_len_32           1.08   374.9±34.32ns        ? ?/sec    1.00   345.8±11.05ns        ? ?/sec
strpos_StringArray_utf8_str_len_4096         1.09   381.6±16.68ns        ? ?/sec    1.00   351.4±23.14ns        ? ?/sec
strpos_StringArray_utf8_str_len_8            1.09   372.9±20.83ns        ? ?/sec    1.00   343.3±11.78ns        ? ?/sec
strpos_StringViewArray_ascii_str_len_128     1.79      3.2±0.15ms        ? ?/sec    1.00  1763.8±44.16µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_32      1.03   648.7±18.27µs        ? ?/sec    1.00   628.1±21.69µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_4096    1.24     62.8±7.27ms        ? ?/sec    1.00     50.5±2.05ms        ? ?/sec
strpos_StringViewArray_ascii_str_len_8       1.00   280.2±10.44µs        ? ?/sec    1.05   294.8±42.47µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_128      1.03      5.3±0.13ms        ? ?/sec    1.00      5.1±0.12ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_32       1.01  1961.9±57.34µs        ? ?/sec    1.00  1944.1±73.19µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_4096     1.03    147.4±9.24ms        ? ?/sec    1.00    142.5±3.61ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_8        1.01   874.5±26.02µs        ? ?/sec    1.00   863.4±40.68µs        ? ?/sec

@goldmedal
Copy link
Contributor Author

I think it's not executed and triggered some error, see benchmark StringArray cases only take several ns, and StringViewArray cases are at least several us I guess it's because in micro bench, arguments are of different physical string type and break the function implementation (strpos(string_col, string_view_col))

You're right. There are some errors here. I ran the benchmark in the strpos test case and got the following error:

Error: Execution("Unsupported data type combination (Utf8, Utf8View) for function strpos")

It seems that strpos doesn't allow the second argument to be a StringView if the first one is a StringArray. I'm not sure if this case makes sense. Maybe we can file another issue for it if it's reasonable.

Then, I realized it was my mistake 😢. When I generated the substring for the StringArray, I mistakenly set its type to StringView. After fixing it, the benchmark for StringArray works fine.

]
} else {
let string_array: StringArray = output_string_vec.clone().into_iter().collect();
let sub_string_array: StringArray =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebase to make all benchmark-related changes in the first commit. It was

let sub_string_array: StringViewArray = output_string_vec.clone().into_iter().collect();

After changing to StringArray, the benchmark works fine.

@goldmedal
Copy link
Contributor Author

goldmedal commented Sep 11, 2024

@2010YOUY01 Thank you for the review. I updated the benchmark result. After fixing the benchmark, we can find the StringArray case also be improved.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @goldmedal -- this PR looks really nice 🙏

so-beautiful

🚀

@alamb alamb merged commit 269a473 into apache:main Sep 17, 2024
26 checks passed
@goldmedal goldmedal deleted the feature/12366-strpos-ascii branch September 18, 2024 01:54
@goldmedal
Copy link
Contributor Author

Thanks @alamb @2010YOUY01

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize strpos() string function with ASCII fast path
3 participants