Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "Cleanup some string operating functions" PR #100451

Conversation

AaronRobinsonMSFT
Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT commented Mar 29, 2024

Unrevert #96099

The revert of #96099 was done in #97264. Commits after the first fix troublesome locations.

See #97264 for details on what the original PR impacted.

/cc @tommcdon @hoyosjs @huoyaoyuan

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Use correct error code mechanism.
src/coreclr/vm/methodtablebuilder.cpp Show resolved Hide resolved
src/coreclr/vm/methodtablebuilder.cpp Show resolved Hide resolved
src/coreclr/vm/classcompat.cpp Show resolved Hide resolved
if (SUCCEEDED(hr))
{
s.Resize(length, REPRESENTATION_UTF8);
COUNT_T length = WszWideCharToMultiByte(CP_UTF8, 0, GetRawUnicode(), GetRawCount()+1, NULL, 0, NULL, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this used FString::Unicode_Utf8_Length to avoid some perf problems in WszWideCharToMultiByte. Is switching to WszWideCharToMultiByte going to introduce performance regression? (I am particularly worried about Windows.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check here.

Regardless, in principle I would prefer to defer to system APIs that we can/should assume are optimized. Having these narrowly defined "optimizations" is a non-trivial tax for an already covoluted space like SString.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should get rid of FString::*. We may want to replace it with a call to the minipal UTF8 methods for perf reasons like what the original change tried to do. I would expect that minipal UTF8 convertors are going to have significantly better perf than Windows OS WideCharToMultiByte.

system APIs that we can/should assume are optimized

Historically, Windows OS WideCharToMultiByte has been significantly slower for UTF8 than one would expect from a reasonably optimized implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would static linking simdutf to the VM for such conversions make sense here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to replace it with a call to the minipal UTF8 methods for perf reasons like what the original change tried to do.

So that I can get behind. There is a small performance regression here. Let me replace this with the minipal APIs. I chose this direction to match the other API. I'd prefer symmetrical API usage in this case as it avoids confusion.

Would static linking simdutf to the VM for such conversions make sense here?

That is something we can discuss, but it not for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can push the hotpaths to managed

That seems unlikely in this case. The use of SString is litered throughout the system and there are many places where it is going to be unnatural and difficult to call into managed code. We can try jumping out in some cases, but that isn't something I am inclined to do in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bar for these types of changes is to avoid perf regressions. (Of course, perf improvements are nice - but it is better to evaluate them separately if they come with tradeoffs.)

Neither the existing FString fast path code nor the WideCharToMultiByte implementation built into Windows use any vectorization tricks. It suggests that we should not need vectorization tricks to avoid the perf regression here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shape of the code in WideCharToMultiByte built into Windows is actually very similar to the corefx implementation and the minipal. It seems that the minipal implementation picked up some cruft alone the way that makes it quite a bit slower.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting that we picked up minipal implementation for Windows + Unix in mono (and Unix in coreclr as before) in .NET 8 and haven't seen any report of regression. Maybe there is some low-hanging alignment issue which can be tweaked via cl switch?

Copy link
Member

@jkotas jkotas Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picked up minipal implementation for Windows + Unix in mono

Mono does not get as much perf scrutiny as CoreCLR.

low-hanging alignment issue

I do not think it is that.

From a cursory look, I see two problems:

  • For smaller lengths, there is a fixed overhead. Adding a simple FString-like loop that deals with small ASCII-only strings as the first thing in the minipal_* methods should fix that.
  • For longer lengths, the code of the core look does not look great (e.g. it uses more registers than necessary). For example, it may help to change
    *pTarget = (unsigned char)ch;
    *(pTarget + 1) = (unsigned char)(ch >> 16);
    pSrc += 4;
    *(pTarget + 2) = (unsigned char)chc;
    *(pTarget + 3) = (unsigned char)(chc >> 16);
    pTarget += 4;
    to
                    *pTarget = (unsigned char)ch;
                    *(pTarget + 2) = (unsigned char)(chc);
                    pSrc += 4;
                    *(pTarget + 1) = (unsigned char)(ch >> 16);
                    *(pTarget + 3) = (unsigned char)(chc >> 16);
                    pTarget += 4;

@AaronRobinsonMSFT AaronRobinsonMSFT marked this pull request as draft May 3, 2024 04:10
@am11 am11 mentioned this pull request May 21, 2024
@dotnet-policy-service dotnet-policy-service bot removed this from the 9.0.0 milestone Jun 2, 2024
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants