Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize string single char contains calls. #100024

Merged
merged 1 commit into from
Dec 9, 2024

Conversation

Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Dec 4, 2024

Here's why it's important to optimize single-char calls:

  • contains is easier to compute for single-char strings, because no nested loops are needed. It may even be SIMD vectorized with a good and / or compiler.
  • There are a lot of callers to these functions. Every one of these benefits a bit from this optimization, lubricating the gears.

For reviewers

The PR is more simple than it seems.
Almost all changes are search and replace with regex and should be correct, like:

  • find contains\("(\\?.)"", replace with contains_char('$1')

The exception is ustring.cpp (and ustring.h), where the optimization is being dispatched.

Old Version

Parts of this PR were duplicate / alternative solutions to #92475. I have since consolidated this one to only feature the optimized contains calls. single-char replace calls will be discussed and finalized in #92475.

Old Description

Here's why it's important to optimize single-char calls:

  • both replace and contains are much easier to compute for single-char strings, because no nested loops are needed. It may even be SIMD vectorized with a good and / or compiler.
  • There are a lot of callers to these functions. Every one of these benefits a bit from this optimization, lubricating the gears.
  • replace with string arguments copies the data even if there are no other CowData co-owners. It also iterates the array twice. This can be exploited by ustring.cpp (although not from elsewhere, because string.replace makes a copy anyway).

This PR depends on (and includes) #100015.

I don't have a benchmark yet, but I believe the single-char replace and contains functions should be several times faster than the full string ones.

For reviewers

The PR is more simple than it seems.
Almost all changes are search and replace with regex and should be correct, like:

  • find replace\("(\\?.)", "(\\?.)", replace with replace('$1', '$2')
  • same for contains, similar for find

The exceptions are:

  • cowdata.h, where replace is implemented.
  • ustring.cpp where some algorithms are optimized to make better use of the function.

@Ivorforce Ivorforce requested review from a team as code owners December 4, 2024 21:02
@Ivorforce Ivorforce force-pushed the optimize-string-single-char branch from 6a3f79f to 5f5c93c Compare December 4, 2024 21:06
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 4, 2024

I have made a simple benchmark (with auto lane-switching to the single-char string function disabled temporarily):

	String s = "Who is Frederic?Who is Frederic?Who is Frederic?";
	auto t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace("e", "b");
	}
	auto t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
	t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace('e', 'b');
	}
	t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
	t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace("#", "b");
	}
	t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
	t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		String s1 = s.replace('#', 'b');
	}
	t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";

This prints

253us
84us
24us
18us

So, just very roughly, the single-char implementation can be:

  • About 3x as fast when some characters are replaced (i.e. a copy is made).
  • About 1.3x as fast when no characters are replaced (i.e. no copy is made).

contains

A similar test for contains:

	String s = "Who is Frederic?Who is Frederic?Who is Frederic?";
	auto t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 100000; i ++) {
		int x = s.contains("e");
	}
	auto t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
	t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 100000; i ++) {
		int x = s.contains('e');
	}
	t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";

prints

622us
501us

So, it also benefits from single-char optimization, though not as much (1.24x). But at the same time, its optimization was already implemented, so I'm only contributing by relaying a few calls to it instead of the char * based variant.

@Repiteo Repiteo requested review from a team and removed request for a team December 6, 2024 17:33
Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

core/string/ustring.cpp Outdated Show resolved Hide resolved
core/string/ustring.cpp Outdated Show resolved Hide resolved
@Ivorforce Ivorforce force-pushed the optimize-string-single-char branch from c3ddfd4 to b5c31eb Compare December 6, 2024 19:23
@adamscott adamscott modified the milestones: 4.x, 4.4 Dec 6, 2024
Copy link
Member

@hpvb hpvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much rather see a String::contains(const char_t), or equivalent overload. So future developers can't make the mistake. It also will make the change local to ustring.cpp/h, or a (const char[2]) overload, I guess. perhaps both? :)

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 7, 2024

I'd much rather see a String::contains(const char_t), or equivalent overload. So future developers can't make the mistake.

I had it as an String::contains overload first, but I was requested above to rename it to contains_char because similar single-char functions are appended with _char. I believe the main reason is that it is easier to bind to scripting languages this way?

It also will make the change local to ustring.cpp/h

How would that be possible? Callers that currently call with contains(char *) will have to be adjusted to use the single-char function with either the contains name and the contains_char name. The only way to make it consistently select the single-char implementation at compile time would be to expose the strlen(x) == 1 check in the header file, which I'm not sure if it is a good idea to do for every single function. Or idk, maybe it is - it would bloat the header file somewhat, requiring inlining by String callers, but it's a fairly simple check.

, or a (const char[2]) overload, I guess. perhaps both?

I experimented with this earlier, but this logic would fail for contains({ 'a', 'b' }) which is not appended with NULL. Yeah, it's a bit annoying.

@hpvb
Copy link
Member

hpvb commented Dec 7, 2024

I'd much rather see a String::contains(const char_t), or equivalent overload. So future developers can't make the mistake.

I had it as an String::contains overload first, but I was requested above to rename it to contains_char because similar single-char functions are appended with _char. I believe the main reason is that it is easier to bind to scripting languages this way?

The speed improvements of contains_char would have to be massive for it to matter in gdscript, but perhaps I'm being too pessimistic :P

It also will make the change local to ustring.cpp/h

How would that be possible? Callers that currently call with contains(char *) will have to be adjusted to use the single-char function with either the contains name and the contains_char name. The only way to make it consistently select the single-char implementation at compile time would be to expose the strlen(x) == 1 check in the header file, which I'm not sure if it is a good idea to do for every single function.

If we do a const char[2] overload, we can just assume size=1, right?

, or a (const char[2]) overload, I guess. perhaps both?

I experimented with this earlier, but this logic would fail for contains({ 'a', 'b' }) which is not appended with NULL. Yeah, it's a bit annoying.

Wouldn't that fail now also? If there's no null it'd just crash now, no? as in, that is already an error?

Copy link
Member

@hpvb hpvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't get it merged with an overload lets just keep it like this. Maybe we can do a pass over all of the _char functions and just remove them in one fell swoop eventually.

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 7, 2024

The speed improvements of contains_char would have to be massive for it to matter in gdscript, but perhaps I'm being too pessimistic :P

GDScript may not benefit, but C# and GDExtension also use the scripting interface (though I also don't know what that has to do with the c++ exposed name).

If we do a const char[2] overload, we can just assume size=1, right?

, or a (const char[2]) overload, I guess. perhaps both?

I experimented with this earlier, but this logic would fail for contains({ 'a', 'b' }) which is not appended with NULL. Yeah, it's a bit annoying.

Wouldn't that fail now also? If there's no null it'd just crash now, no? as in, that is already an error?

This turns out to be surprisingly difficult to answer, lol.

Here's a test I just ran to answer this question:

static const char s[] = { 'a', 'b' };

template <size_t l>
size_t test(const char (&s)[l]) {
	return l;
}

__attribute__((noinline)) size_t test2(const char *s) {
	return strlen(s);
}

int main()
{
	std::cout << strlen(s) << std::endl;
	std::cout << test(s) << std::endl;
	std::cout << test2(s) << std::endl;
}

It prints

2
2
24

That's right, clang cheats with its strlen implementation, assuming a NULL terminator at compile time when it should probably complain instead.

In any case, calling String.contains({ 'a', 'b' }) would yield unexpected behavior. If evaluated at compile time, it would return String.contains(a). If evaluated at runtime, it might even SIGSEV. I guess it would be fair to document the behavior of this function. This is one of the reasons I used strlen instead of len in #99806 even for the arrays known at compile time - it failed unit tests otherwise.

@Ivorforce
Copy link
Contributor Author

If we can't get it merged with an overload lets just keep it like this. Maybe we can do a pass over all of the _char functions and just remove them in one fell swoop eventually.

That's fair; for the record, I also think a contains overload without a _char suffix is more appropriate. But it's something we could address later, it's not a dealbreaker.

@hpvb
Copy link
Member

hpvb commented Dec 7, 2024

@Ivorforce surely calling strlen() on a char array without a null terminator is already UB. I think you're just digging into how surprising the UB is. :P strlen({'a', 'b'}); might kill your cat and you don't get to complain about it.

Copy link
Contributor

@kiroxas kiroxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Straightforward one. Nice !

Copy link
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not at all convinced that the _char append is actually necessary, but I'm not interested in bikeshedding this PR when there's already obvious performance gains. I'll just echo hpvb's sentiments on eventually doing a pass which removes them entirely & call it a day

@Repiteo Repiteo merged commit a607bca into godotengine:master Dec 9, 2024
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Dec 9, 2024

Thanks!

@Ivorforce Ivorforce deleted the optimize-string-single-char branch December 9, 2024 20:38
@@ -97,7 +97,7 @@ FT_Error tvg_svg_in_ot_preset_slot(FT_GlyphSlot p_slot, FT_Bool p_cache, FT_Poin
if (parser->has_attribute("id")) {
const String &gl_name = parser->get_named_attribute_value("id");
if (gl_name.begins_with("glyph")) {
int dot_pos = gl_name.find(".");
int dot_pos = gl_name.find_char('.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that you made these changes, will make a PR fixing this, these are not valid because this is an unexposed method, see:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry and thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have noticed as I did my own sweep with these before so it's on me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants