Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -529,26 +529,35 @@ private UTF8String copyUTF8String(int start, int end) {
return UTF8String.fromBytes(newBytes);
}

/**
* Trims space characters (ASCII 32) from both ends of this string.
*
* @return this string with no spaces at the start or end
*/
public UTF8String trim() {
int s = 0;
// skip all of the space (0x20) in the left side
while (s < this.numBytes && getByte(s) == 0x20) s++;
if (s == this.numBytes) {
// empty string
// Everything trimmed
return EMPTY_UTF8;
}
// skip all of the space (0x20) in the right side
int e = this.numBytes - 1;
while (e > s && getByte(e) == 0x20) e--;
if (s == 0 && e == numBytes - 1) {
// Nothing trimmed
return this;
}
return copyUTF8String(s, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid copying by changing offset if the string contains spaces at the left side, and changing numBytes for spaces at the right side?

Copy link
Member

@kiszk kiszk Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks non-safe if the string is shared by others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean creating new instance of UTF8String by passing new value of offset or numBytes, and the same reference to base. Does UTF8String modify underlying base object somewhere in place?

Copy link
Member

@kiszk kiszk Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. My interpretation of avoid copying was to update fieild in this object. It functionally works well.

Such an optimization was done in String.substring in previous JDKs.
Pros: Avoid copying characters when trim()
Cons: Cannot free a UTF8String object object even if the UTF8String object, which was referenced by the trimed UTF8String, is dead under the case that trimed UTF8String is live.

I think that the current implementation is preferable since it does not increase # of live UTF8String objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're describing is kind of how the JVM implements "compressed strings". Interesting notes here on further optimizations in Java 9: https://www.baeldung.com/java-9-compact-string
One day we might revisit whether UTF8String is improving over String!

}

/**
* Based on the given trim string, trim this string starting from both ends
* This method searches for each character in the source string, removes the character if it is
* found in the trim string, stops at the first not found. It calls the trimLeft first, then
* trimRight. It returns a new string in which both ends trim characters have been removed.
* Trims instances of the given trim string from both ends of this string.
*
* @param trimString the trim character string
* @return this string with no occurrences of the trim string at the start or end, or `null`
* if `trimString` is `null`
*/
public UTF8String trim(UTF8String trimString) {
if (trimString != null) {
Expand All @@ -558,24 +567,32 @@ public UTF8String trim(UTF8String trimString) {
}
}

/**
* Trims space characters (ASCII 32) from the start of this string.
*
* @return this string with no spaces at the start
*/
public UTF8String trimLeft() {
int s = 0;
// skip all of the space (0x20) in the left side
while (s < this.numBytes && getByte(s) == 0x20) s++;
if (s == 0) {
// Nothing trimmed
return this;
}
if (s == this.numBytes) {
// empty string
// Everything trimmed
return EMPTY_UTF8;
} else {
return copyUTF8String(s, this.numBytes - 1);
}
return copyUTF8String(s, this.numBytes - 1);
}

/**
* Based on the given trim string, trim this string starting from left end
* This method searches each character in the source string starting from the left end, removes
* the character if it is in the trim string, stops at the first character which is not in the
* trim string, returns the new string.
* Trims instances of the given trim string from the start of this string.
*
* @param trimString the trim character string
* @return this string with no occurrences of the trim string at the start, or `null`
* if `trimString` is `null`
*/
public UTF8String trimLeft(UTF8String trimString) {
if (trimString == null) return null;
Expand All @@ -597,34 +614,43 @@ public UTF8String trimLeft(UTF8String trimString) {
}
srchIdx += searchCharBytes;
}

if (srchIdx == 0) {
// Nothing trimmed
return this;
}
if (trimIdx >= numBytes) {
// empty string
// Everything trimmed
return EMPTY_UTF8;
} else {
return copyUTF8String(trimIdx, numBytes - 1);
}
return copyUTF8String(trimIdx, numBytes - 1);
}

/**
* Trims space characters (ASCII 32) from the end of this string.
*
* @return this string with no spaces at the end
*/
public UTF8String trimRight() {
int e = numBytes - 1;
// skip all of the space (0x20) in the right side
while (e >= 0 && getByte(e) == 0x20) e--;

if (e == numBytes - 1) {
// Nothing trimmed
return this;
}
if (e < 0) {
// empty string
// Everything trimmed
return EMPTY_UTF8;
} else {
return copyUTF8String(0, e);
}
return copyUTF8String(0, e);
}

/**
* Based on the given trim string, trim this string starting from right end
* This method searches each character in the source string starting from the right end,
* removes the character if it is in the trim string, stops at the first character which is not
* in the trim string, returns the new string.
* Trims instances of the given trim string from the end of this string.
*
* @param trimString the trim character string
* @return this string with no occurrences of the trim string at the end, or `null`
* if `trimString` is `null`
*/
public UTF8String trimRight(UTF8String trimString) {
if (trimString == null) return null;
Expand Down Expand Up @@ -658,12 +684,15 @@ public UTF8String trimRight(UTF8String trimString) {
numChars --;
}

if (trimEnd == numBytes - 1) {
// Nothing trimmed
return this;
}
if (trimEnd < 0) {
// empty string
// Everything trimmed
return EMPTY_UTF8;
} else {
return copyUTF8String(0, trimEnd);
}
return copyUTF8String(0, trimEnd);
}

public UTF8String reverse() {
Expand Down