Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetAssembly of CurrentDomain by Name (AMD Ryzen slowdown) #12621

Closed
strajk- opened this issue May 2, 2019 · 18 comments
Closed

GetAssembly of CurrentDomain by Name (AMD Ryzen slowdown) #12621

strajk- opened this issue May 2, 2019 · 18 comments
Assignees
Labels
area-System.Threading question Answer questions and provide assistance, not an issue with source code or documentation. tenet-performance Performance related issue

Comments

@strajk-
Copy link

strajk- commented May 2, 2019

Hello everyone,
Hello @tannergooding (mention as per request from reddit thread a year ago.)

A year ago we upgraded the system of one of our devs and decided to go with a Ryzen 1700x.
What we realized is that one of our Application, at startup, took awfully long to load.

That application fetches at startup all loaded Assemblies and sorts them into a list depending on its Type.
And for some reason, only at that one Ryzen CPU, it gets a severe slowdown, taking the App over 2 minutes to completely loadup, even though it would take 5 to 10 seconds at most.

Strangely enough, if I install VMware on that Ryzen System, and run the same Application in that Virtualized Environment, its snappy and doesn't slowdown at all.
Unfortunately we do not have any other System running on a Ryzen CPU to test it out further, so we could only try this out on Intel based Systems where this issue does not occur.

I created a repo with a test program that simulates this issue, after compiling it all you have to do is press the "Test" Button, it will tryout the procedure I described above 8 times, in our Ryzen based system it always starts slowing down at the 3rd attempt, taking over 5 seconds or even up to a whole minute to finish its task once.

For comparison here is a Screenshot of how fast it finishes on our Intel Systems:
https://imgur.com/IK405TL
And now our Ryzen System that had 3 slowdowns:
https://imgur.com/cylL1qn

Could this issue be caching related?

@MeiChin-Tsai
Copy link

@jeffschwMSFT can you triage this? thx.

@jeffschwMSFT
Copy link
Member

@strajk- is this currently blocking you? We are in the process of shutting down our current relase and are narrowing the types of fixes that we are considering. Could you send us a trace of this application running on the interesting hardware?

@billwert @brianrob do we have hardware similar to the Ryzen that we could use to help reproduce this?

cc @vitek-karas

@brianrob
Copy link
Member

brianrob commented May 2, 2019

I don't believe that we do. @RussKeldorph do you have anything simliar?

@tannergooding
Copy link
Member

I have a Ryzen 1800X at home if we don't have anything else to repro on.

There are also the Lv2-Series VMs in Azure, which are powered by AMD's EPYC CPUs (which are also based on the Zen micro-architecture).

@strajk-
Copy link
Author

strajk- commented May 2, 2019

@jeffschwMSFT unfortunately last year due to this issue we decided to turn that system into a VM host, so I don't have a dev environment setup on that machine to trace it.
I had a friend of mine try this out on his Ryzen 1600X, same result, big slowdown after X tries:
vhzsJHMh

If @tannergooding could try it out on his 1800X I would be very thankful.
I don't know if this will be reproduceable on the VMs in Azure, since for some reason my VMs seem to bypass this behavior.

@tannergooding
Copy link
Member

I can indeed reproduce on my 1800X using the repo as-is:
image

However, it is worth noting that the repo "as-is" is targeting .NET 4.0 Client Profile. I also attempted to run against .NET 4.7.2 and got the same behavior.

I then ran against .NET Core 3.0 and everything looks to work as expected:
image

@strajk-
Copy link
Author

strajk- commented May 2, 2019

Thank you for trying it out, yeah I should had mentioned that.
Our Applications are currently on .NET 4.0, we did try 4.7.2 and had the same result like you stated, very important info I didn't mention.

We have several clients that unfortunately do not accept updating since the software is being used in production and requires validation on premise which is both time consuming and expensive for them.
So without a good reason they're not willing to do so (a fix would be a good reason).

Pardon my ignorance, but isn't the .NET Core 3.0 codebase basically .NET 4.8?
Not very experienced with both GitHub and .NET releases in general.
If that's the case I could compile it in .NET 4.8 and try again.

@tannergooding
Copy link
Member

This looks to be a side-effect of using the Workstation GC on a machine with a high hardware thread count.

Changing the app.config to be:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runtime>
    <gcServer enabled="true"></gcServer>
  </runtime>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0,Profile=Client"/>
  </startup>
</configuration>

Looks to resolve the issue:
image

@Maoni0 is probably the best suited person to give any other tips/tricks on how you might be able to tweak the GC settings accordingly -- Also to explicitly call it out for her, this is on .NET 4.0 Client Profile running against an AMD Ryzen CPU (8 cores, 16 threads for the tested devices).

@billwert
Copy link
Member

billwert commented May 2, 2019

@Maoni0 if there's a fix in .NET Core that's making this work better than .NET Framework I'd be curious to know what it is.

@tannergooding
Copy link
Member

I did validate that both Workstation and Server GC work as expected for .NET Core 3. Given that this is a WPF app, I did not test on .NET Core 2.2 or prior.

@Maoni0
Copy link
Member

Maoni0 commented May 2, 2019

is the question "why does it work better on .NET core 3 than .NET 4.8 for the scenario where you have high thread count while using Workstation GC"?

@billwert
Copy link
Member

billwert commented May 2, 2019

Yes.

@strajk-
Copy link
Author

strajk- commented May 3, 2019

Hello,

setting Server GC does seem to fix this issue on our Ryzen Device.
Is it considered good practice doing this with a .config File though? We generally do not release our .net Apps with one.
Our rule of thumb has always been that these types of configuration were to be made in the Project Properties in order to be compiled within with no further files, I guess we are wrong with that approach then.

I also always thought that the Garbage Collector would adjust by itself depending on the hardware it is being run on, but I guess it does make sense since different applications require different modes depending on their usecase.

Still open would be what .NET Core 3.0 does so differently that it wouldn't repro the same issue as when compiled with 4.7.2 or 4.8

@Maoni0
Copy link
Member

Maoni0 commented May 3, 2019

@tannergooding collected some trace for us and it doesn't appear to be a GC problem; @billwert will continue the investigation.

@RussKeldorph
Copy link
Contributor

Trying threading since this still seems likely somehow related to high core count. Feel free to remove if proven otherwise.

@tannergooding
Copy link
Member

We've got an e-mail thread going about this offline.

I did some further investigation and was not able to repro the issue on .NET Core 2.x under slightly modified sources (can't do an exact repro due to the original scenario using and relying on WPF assemblies). However, I was able to still repro the issue on full framework.

I also noticed that the steady state (with TC off) looks to have regressed from 2.x to 3.0. Namely, in 2.x and the first two iterations on 3.0 each iteration takes about 0.4 seconds (split evenly between Init1 and Init2). However, on 3.0 in the third iteration and onways, it regresses to 1.5 seconds (split almost evenly again).

The slightly modified sources also showed similar numbers on my Intel machine, where-as the original sources did not.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 9, 2020
Copy link
Contributor

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Oct 10, 2024
Copy link
Contributor

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

@dotnet-policy-service dotnet-policy-service bot removed this from the Future milestone Oct 24, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Nov 23, 2024
@dotnet-policy-service dotnet-policy-service bot removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Nov 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Threading question Answer questions and provide assistance, not an issue with source code or documentation. tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

10 participants