Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:implement sql style 'substr_index' string function #8272

Merged
merged 14 commits into from
Nov 26, 2023

Conversation

Syleechan
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substring-index:~:text=SUBSTRING_INDEX(str%2Cdelim%2Ccount)
In calcite, the function expression name is substr_index

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Nov 20, 2023
----
NULL

query ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, can we also have the same tests with empty strings as input and search token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add empty string tests and 0 count tests

/// SUBSTRING_INDEX('www.apache.org', '.', -2) = apache.org
/// SUBSTRING_INDEX('www.apache.org', '.', -1) = org
pub fn substr_index<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
let string_array = as_generic_string_array::<T>(&args[0])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add a defense check args is exactly 3 elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I add the args len check

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Syleechan please take a look into minor comments

Copy link
Member

@Ted-Jiang Ted-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 , and tested below in mysql, maybe add them is better
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 4); -> www.mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); ->

@Syleechan Syleechan closed this Nov 21, 2023
@Syleechan Syleechan reopened this Nov 21, 2023
@Syleechan
Copy link
Contributor Author

LGTM 👍 , and tested below in mysql, maybe add them is better SELECT SUBSTRING_INDEX('www.mysql.com', '.', 4); -> www.mysql.com SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); ->

just the same as SELECT substr_index('www.apache.org', 'ac', 2)

/// SUBSTRING_INDEX('www.apache.org', '.', -1) = org
pub fn substr_index<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
if args.len() != 3 {
return Err(DataFusionError::Internal(format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use internal_err! macros

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, change to internal_err.

@comphead
Copy link
Contributor

Thanks @Syleechan , we are very close, please address the minor to use error macros, it gives a possibility to backtrace the error

@Syleechan
Copy link
Contributor Author

@alamb please help to merge if there are no other concerns, thanks.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @Syleechan

@alamb
Copy link
Contributor

alamb commented Nov 26, 2023

Thanks @Syleechan as well as @comphead and @Ted-Jiang for the review

Since @Ted-Jiang is a committer as well, in the future please feel free to merge PRs once they have been reviewed and follow https://arrow.apache.org/datafusion/contributor-guide/index.html#pull-request-overview (as this one has)

@alamb alamb merged commit 234217e into apache:main Nov 26, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants