-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces #26626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
0bd8239
41feca0
72a2065
79abc93
773eb4b
9888e20
3cd5433
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -553,6 +553,34 @@ public UTF8String trim() { | |
| return copyUTF8String(s, e); | ||
| } | ||
|
|
||
| /** | ||
| * Trims whitespaces (<= ASCII 32) from both ends of this string. | ||
| * | ||
| * Note that, this method is the same as java's {@link String#trim}, and different from | ||
| * {@link UTF8String#trim()} which only remove only spaces(= ASCII 32) from both ends. | ||
|
||
| * | ||
| * @return A UTF8String whose value is this UTF8String, with any leading and trailing white | ||
| * space removed, or this UTF8String if it has no leading or trailing whitespace. | ||
| * | ||
| */ | ||
| public UTF8String trimAll() { | ||
| int s = 0; | ||
| // skip all of the whitespaces (<=0x20) in the left side | ||
| while (s < this.numBytes && getByte(s) <= 0x20) s++; | ||
|
||
| if (s == this.numBytes) { | ||
| // Everything trimmed | ||
| return EMPTY_UTF8; | ||
| } | ||
| // skip all of the whitespaces (<=0x20) in the right side | ||
| int e = this.numBytes - 1; | ||
| while (e > s && getByte(e) <= 0x20) e--; | ||
| if (s == 0 && e == numBytes - 1) { | ||
| // Nothing trimmed | ||
| return this; | ||
| } | ||
| return copyUTF8String(s, e); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looking at the caller side, I think it's safe to not copy the data. We can add a caveat in the javadoc that: this method doesn't copy the data and the caller side should do copy themselves if they want to hold it for a while.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we not copy when it is an
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah that's a good point. OK let's leave it. |
||
| } | ||
|
|
||
| /** | ||
| * Trims instances of the given trim string from both ends of this string. | ||
| * | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so SQL trim just removes space, and java.lang.String.trim() removes everything <= 32. Maybe we could refer to that in this doc, that this is the purpose of this additional trim method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, let's mention it's the same as
java.lang.String.trim