-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "Cleanup some string operating functions" PR #100451
Fix "Cleanup some string operating functions" PR #100451
Conversation
This reverts commit b8114f9.
Tagging subscribers to this area: @mangod9 |
Use correct error code mechanism.
if (SUCCEEDED(hr)) | ||
{ | ||
s.Resize(length, REPRESENTATION_UTF8); | ||
COUNT_T length = WszWideCharToMultiByte(CP_UTF8, 0, GetRawUnicode(), GetRawCount()+1, NULL, 0, NULL, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that this used FString::Unicode_Utf8_Length
to avoid some perf problems in WszWideCharToMultiByte
. Is switching to WszWideCharToMultiByte
going to introduce performance regression? (I am particularly worried about Windows.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check here.
Regardless, in principle I would prefer to defer to system APIs that we can/should assume are optimized. Having these narrowly defined "optimizations" is a non-trivial tax for an already covoluted space like SString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should get rid of FString::*
. We may want to replace it with a call to the minipal UTF8 methods for perf reasons like what the original change tried to do. I would expect that minipal UTF8 convertors are going to have significantly better perf than Windows OS WideCharToMultiByte.
system APIs that we can/should assume are optimized
Historically, Windows OS WideCharToMultiByte has been significantly slower for UTF8 than one would expect from a reasonably optimized implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would static linking simdutf to the VM for such conversions make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to replace it with a call to the minipal UTF8 methods for perf reasons like what the original change tried to do.
So that I can get behind. There is a small performance regression here. Let me replace this with the minipal APIs. I chose this direction to match the other API. I'd prefer symmetrical API usage in this case as it avoids confusion.
Would static linking simdutf to the VM for such conversions make sense here?
That is something we can discuss, but it not for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can push the hotpaths to managed
That seems unlikely in this case. The use of SString
is litered throughout the system and there are many places where it is going to be unnatural and difficult to call into managed code. We can try jumping out in some cases, but that isn't something I am inclined to do in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bar for these types of changes is to avoid perf regressions. (Of course, perf improvements are nice - but it is better to evaluate them separately if they come with tradeoffs.)
Neither the existing FString fast path code nor the WideCharToMultiByte implementation built into Windows use any vectorization tricks. It suggests that we should not need vectorization tricks to avoid the perf regression here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape of the code in WideCharToMultiByte built into Windows is actually very similar to the corefx implementation and the minipal. It seems that the minipal implementation picked up some cruft alone the way that makes it quite a bit slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting that we picked up minipal implementation for Windows + Unix in mono (and Unix in coreclr as before) in .NET 8 and haven't seen any report of regression. Maybe there is some low-hanging alignment issue which can be tweaked via cl switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picked up minipal implementation for Windows + Unix in mono
Mono does not get as much perf scrutiny as CoreCLR.
low-hanging alignment issue
I do not think it is that.
From a cursory look, I see two problems:
- For smaller lengths, there is a fixed overhead. Adding a simple FString-like loop that deals with small ASCII-only strings as the first thing in the
minipal_*
methods should fix that. - For longer lengths, the code of the core look does not look great (e.g. it uses more registers than necessary). For example, it may help to change
runtime/src/native/minipal/utf8.c
Lines 1613 to 1618 in 7b18be5
*pTarget = (unsigned char)ch; *(pTarget + 1) = (unsigned char)(ch >> 16); pSrc += 4; *(pTarget + 2) = (unsigned char)chc; *(pTarget + 3) = (unsigned char)(chc >> 16); pTarget += 4;
*pTarget = (unsigned char)ch;
*(pTarget + 2) = (unsigned char)(chc);
pSrc += 4;
*(pTarget + 1) = (unsigned char)(ch >> 16);
*(pTarget + 3) = (unsigned char)(chc >> 16);
pTarget += 4;
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Unrevert #96099
The revert of #96099 was done in #97264. Commits after the first fix troublesome locations.
See #97264 for details on what the original PR impacted.
/cc @tommcdon @hoyosjs @huoyaoyuan