Skip to content

Conversation

@uros-db
Copy link
Contributor

@uros-db uros-db commented May 14, 2024

What changes were proposed in this pull request?

Currently, UTF8String.indexOf returns 0 when given an empty parameters string, and any integer start value. Examples:

"abc".indexOf("", 0);  // returns: 0
"abc".indexOf("", 2);  // returns: 0
"abc".indexOf("", 9);  // returns: 0
"abc".indexOf("", -3); // returns: 0

This is not correct, as "start" is not taken into consideration. Correct behaviour would be:

"abc".indexOf("", 0);  // returns: 0
"abc".indexOf("", 2);  // returns: 2
"abc".indexOf("", 9);  // returns: -1
"abc".indexOf("", -3); // returns: -1

Why are the changes needed?

Fix a bug in indexOf implementation.

Does this PR introduce any user-facing change?

Yes, indexOf behaves differently with empty patterns.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 14, 2024
@uros-db uros-db changed the title [WIP] Fix UTF8String indexOf behaviour for empty string search [SPARK-48284][SQL] Fix UTF8String indexOf behaviour for empty string search May 15, 2024
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uros-db Is the changes still valid?

Copy link
Contributor Author

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are valid, although some conflicts need to be resolved

@srielau @mkaravel do we want to revive this and proceed with the indexOf bug fix?

@MaxGekk
Copy link
Member

MaxGekk commented Oct 30, 2024

@uros-db Is the test failure related to your PR?

[error] Test org.apache.spark.unsafe.types.CollationSupportSuite.testStringLocate failed: org.opentest4j.AssertionFailedError: expected: <1> but was: <-1>, took 0.006s
[error]     at org.apache.spark.unsafe.types.CollationSupportSuite.assertStringLocate(CollationSupportSuite.java:2123)
[error]     at org.apache.spark.unsafe.types.CollationSupportSuite.testStringLocate(CollationSupportSuite.java:2129)
[error]     at java.lang.reflect.Method.invoke(Method.java:569)
[error]     at java.util.ArrayList.forEach(ArrayList.java:1511)
[error]     at java.util.ArrayList.forEach(ArrayList.java:1511)

@uros-db
Copy link
Contributor Author

uros-db commented Oct 30, 2024

yes, those tests likely need to be updated in order for this PR to pass

however, I'm not sure that we want to proceed with this at this time

@github-actions
Copy link

github-actions bot commented Feb 8, 2025

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 8, 2025
@github-actions github-actions bot closed this Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants