Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add String::replace_char(s) methods for performance and convenience #92475

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

AThousandShips
Copy link
Member

@AThousandShips AThousandShips commented May 28, 2024

Will try do some performance comparison soon, but these replace in-place essentially, avoiding any COW or other manipulation, and also avoids creating any temporaries etc.

The multi-character one is optional but added for completeness for a few cases where multiple cases are replaced at once

Can expose to scripting if desired, but generally using characters is a bit more complicated in scripting so haven't added yet

Now exposed, can drop if desired, but this should work well, the exposed methods aren't entirely useful in GDScript but are very useful in GDExtension I would say

Kept as separate commits for ease of editing until approval

See #92433 (comment)

A few more complex cases like String::validate_filename, OS::get_safe_dir_name, and get_csharp_project_name could be replaced using this, instead using a vector of characters, but for right now I keep this to the simple cases (mostly, see AnimationLibrary::validate_library_name for an exception)

See also:

@AThousandShips AThousandShips added this to the 4.x milestone May 28, 2024
@AThousandShips AThousandShips changed the title Add String::replace_char(s) methods for performance Add String::replace_char(s) methods for performance and convenience May 28, 2024
@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 989058a to d029832 Compare July 18, 2024 14:28
@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 611a214 to 6a49079 Compare August 19, 2024 13:09
@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 92cb3e7 to 4b78ad7 Compare August 28, 2024 12:45
@AThousandShips AThousandShips marked this pull request as ready for review August 28, 2024 12:55
@AThousandShips AThousandShips requested review from a team as code owners August 28, 2024 12:55
@AThousandShips
Copy link
Member Author

Uploaded an alternative approach based on the avove, any benchmarking would be appreciated!

Comment on lines +4342 to +4189
if (old_ptr[index] == p_key) {
new_ptr[index] = p_with;
} else {
new_ptr[index] = old_ptr[index];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (old_ptr[index] == p_key) {
new_ptr[index] = p_with;
} else {
new_ptr[index] = old_ptr[index];
const char32_t old_char = old_ptr[index];
if (old_char == p_key) {
new_ptr[index] = p_with;
} else {
new_ptr[index] = old_char;

Possible improvement to avoid accessing twice, but should be optimized out likely

Copy link
Contributor

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have previously shown in #100024 that a single-character replace can be up to 3x faster than string-replace, so I think this is warranted.

I have benchmarked this implementation, and it's a bit faster than the one I had (likely due to migrating the if changed outside the loop, or because of the memcpy):

	String s = "Who is Frederic?Who is Frederic?Who is Frederic?";
	auto t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace("e", "b");
	}
	auto t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
	t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace_char('e', 'b');
	}
	t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
	t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace("#", "b");
	}
	t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
	t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace_char('#', 'b');
	}
	t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";

Resulting in

249us
61us
24us
18us

I just have a few small comments, but otherwise I'm fully behind a merge!

core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Show resolved Hide resolved
@Ivorforce
Copy link
Contributor

Ivorforce commented Dec 6, 2024

Ah, I just about forgot: I think it would be good to check the string length for 1 (and !is_case_insensitive) in _replace_common, and if true, switch to use the replace_char implementation. This should help GDScript also benefit from the single-char implementation, since it has no char data type and cannot otherwise call it.

core/string/ustring.h Outdated Show resolved Hide resolved
@AThousandShips
Copy link
Member Author

AThousandShips commented Dec 18, 2024

Missed the earlier review but went over that, will push some changes in a few hours

Using String as an input for the keys might actually be a good idea, will look into it as an improvement and binding the method as it'd be useful there

Will investigate using variations on String for the keys and add binds for these as optional additions, as I think these would be especially useful for extensions, but will keep them separate so they can be split off if desired

@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 6ad60f1 to 20b8fe5 Compare December 19, 2024 17:38
@AThousandShips
Copy link
Member Author

AThousandShips commented Dec 19, 2024

Covered the above suggestions, but leaving the change to replace_chars for the time being, it's a bit convoluted to add some of the details but should probably be worth it to do it, but keeping this from expanding too far into things

Improvements on that side can be part of a followup if desired, together with exposing this

Edit: worked out the String/char * versions so will clean this up and move to the alternative solution and push tonight or tomorrow, it works well and using that format is far more convenient in my testing

Copy link
Contributor

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-checked after the changes, looks even better now! I had approved earlier, and I'm still approving :)

if (p_keys.is_empty() || len == 0) {
return *this;
}

Copy link
Contributor

@Ivorforce Ivorforce Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about my earlier suggestion to add this here?

if (p_keys.size() == 1) {
    return replace_char(p_keys[0], p_with;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point can test

core/string/ustring.cpp Outdated Show resolved Hide resolved
@AThousandShips
Copy link
Member Author

AThousandShips commented Dec 21, 2024

Been busy but will push a cleaned up and improved version in the next few days

Edit: Testing the new code and will push soon

@AThousandShips AThousandShips force-pushed the string_replace_char branch 2 times, most recently from 2c5cd08 to ae5bc50 Compare December 22, 2024 20:54
@AThousandShips AThousandShips requested a review from a team as a code owner December 22, 2024 20:54
@AThousandShips
Copy link
Member Author

Will push parallel changes for remove_char(s) as well

@@ -429,6 +429,9 @@ class String {
String replace_first(const char *p_key, const char *p_with) const;
String replace(const String &p_key, const String &p_with) const;
String replace(const char *p_key, const char *p_with) const;
String replace_char(char32_t p_key, char32_t p_with) const;
String replace_chars(const String &p_keys, char32_t p_with) const;
String replace_chars(const char *p_keys, char32_t p_with) const;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can easily make a replace_chars(const Vector<char32_t> &, char32_t) version as well, it can use the same method, but leaving with this for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need i think, there's little chance somebody even owns vector<char32_t> instead of String.


template <class T>
static String _replace_chars_common(const String &p_this, const T *p_keys, int p_keys_len, char32_t p_with) {
ERR_FAIL_COND_V_MSG(p_with == 0, p_this, "`with` must not be the NUL character.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move this to the individual functions for easier use, but away from the computer right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants