-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize string single char contains calls. #100024
Optimize string single char contains calls. #100024
Conversation
6a3f79f
to
5f5c93c
Compare
I have made a simple benchmark (with auto lane-switching to the single-char string function disabled temporarily): String s = "Who is Frederic?Who is Frederic?Who is Frederic?";
auto t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000; i ++) {
String s1 = s.replace("e", "b");
}
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000; i ++) {
String s1 = s.replace('e', 'b');
}
t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000; i ++) {
String s1 = s.replace("#", "b");
}
t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000; i ++) {
String s1 = s.replace('#', 'b');
}
t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n"; This prints
So, just very roughly, the single-char implementation can be:
containsA similar test for String s = "Who is Frederic?Who is Frederic?Who is Frederic?";
auto t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 100000; i ++) {
int x = s.contains("e");
}
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 100000; i ++) {
int x = s.contains('e');
}
t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n"; prints
So, it also benefits from single-char optimization, though not as much ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
c3ddfd4
to
b5c31eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much rather see a String::contains(const char_t), or equivalent overload. So future developers can't make the mistake. It also will make the change local to ustring.cpp/h, or a (const char[2]) overload, I guess. perhaps both? :)
I had it as an
How would that be possible? Callers that currently call with
I experimented with this earlier, but this logic would fail for |
The speed improvements of contains_char would have to be massive for it to matter in gdscript, but perhaps I'm being too pessimistic :P
If we do a const char[2] overload, we can just assume size=1, right?
Wouldn't that fail now also? If there's no null it'd just crash now, no? as in, that is already an error? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't get it merged with an overload lets just keep it like this. Maybe we can do a pass over all of the _char functions and just remove them in one fell swoop eventually.
GDScript may not benefit, but C# and GDExtension also use the scripting interface (though I also don't know what that has to do with the c++ exposed name).
This turns out to be surprisingly difficult to answer, lol. Here's a test I just ran to answer this question: static const char s[] = { 'a', 'b' };
template <size_t l>
size_t test(const char (&s)[l]) {
return l;
}
__attribute__((noinline)) size_t test2(const char *s) {
return strlen(s);
}
int main()
{
std::cout << strlen(s) << std::endl;
std::cout << test(s) << std::endl;
std::cout << test2(s) << std::endl;
} It prints
That's right, In any case, calling |
That's fair; for the record, I also think a |
@Ivorforce surely calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Straightforward one. Nice !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not at all convinced that the _char
append is actually necessary, but I'm not interested in bikeshedding this PR when there's already obvious performance gains. I'll just echo hpvb's sentiments on eventually doing a pass which removes them entirely & call it a day
Thanks! |
@@ -97,7 +97,7 @@ FT_Error tvg_svg_in_ot_preset_slot(FT_GlyphSlot p_slot, FT_Bool p_cache, FT_Poin | |||
if (parser->has_attribute("id")) { | |||
const String &gl_name = parser->get_named_attribute_value("id"); | |||
if (gl_name.begins_with("glyph")) { | |||
int dot_pos = gl_name.find("."); | |||
int dot_pos = gl_name.find_char('.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed that you made these changes, will make a PR fixing this, these are not valid because this is an unexposed method, see:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry and thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have noticed as I did my own sweep with these before so it's on me!
Here's why it's important to optimize single-char calls:
contains
is easier to compute for single-char strings, because no nested loops are needed. It may even be SIMD vectorized with a good and / or compiler.For reviewers
The PR is more simple than it seems.
Almost all changes are search and replace with regex and should be correct, like:
contains\("(\\?.)""
, replace withcontains_char('$1')
The exception is
ustring.cpp
(andustring.h
), where the optimization is being dispatched.Old Version
Parts of this PR were duplicate / alternative solutions to #92475. I have since consolidated this one to only feature the optimized
contains
calls. single-charreplace
calls will be discussed and finalized in #92475.Old Description
Here's why it's important to optimize single-char calls:
replace
andcontains
are much easier to compute for single-char strings, because no nested loops are needed. It may even be SIMD vectorized with a good and / or compiler.replace
with string arguments copies the data even if there are no otherCowData
co-owners. It also iterates the array twice. This can be exploited byustring.cpp
(although not from elsewhere, becausestring.replace
makes a copy anyway).This PR depends on (and includes) #100015.
I don't have a benchmark yet, but I believe the single-char replace and contains functions should be several times faster than the full string ones.
For reviewers
The PR is more simple than it seems.
Almost all changes are search and replace with regex and should be correct, like:
replace\("(\\?.)", "(\\?.)"
, replace withreplace('$1', '$2')
contains
, similar forfind
The exceptions are:
cowdata.h
, wherereplace
is implemented.ustring.cpp
where some algorithms are optimized to make better use of the function.