Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate stability of PGO updates #52610

Open
AndyAyersMS opened this issue May 11, 2021 · 8 comments
Open

Investigate stability of PGO updates #52610

AndyAyersMS opened this issue May 11, 2021 · 8 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented May 11, 2021

We are seeing fairly frequent microbenchmark performance shifts with the managed PGO updates. We need to root cause this and figure out how to address them.

Possible causes:

  • non-deterministic training scenarios
  • non-deterministic collection behavior (even for deterministic scenarios)
  • issues in exfiltrating the profile data into the trace records
  • issues in converting the trace records to the .mibc format
  • issues merging multiple runs together into a consolidated .mibc file
  • issues incorporating the .mibc profile data into assemblies (see eg ...)
  • schema version skew leading to the jit dropping profile data
  • other issues causing the jit to drop profile data
  • jit optimizations overly sensitive to small changes in profile data

For example, the last few fluctuations here are correlated with PGO updates:
newplot (22)

See also

First step is perhaps to create a .mibc comparison tool (or mode in in the pgo tool) to try and see how much variation we're seeing from one update to the next. Given two .mibc it would first match up methods and report how many methods have PGO data in one collection but not the other; then for cases where both methods have PGO, it could detect when the schemas have changed, and where the schemas agree, it could run a similarity analysis on the count and class profile data to see if what was measured seems to have changed in any significant way.

category:cq
theme:profile-feedback

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label May 11, 2021
@AndyAyersMS AndyAyersMS self-assigned this May 11, 2021
@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone May 11, 2021
@AndyAyersMS AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels May 11, 2021
@AndyAyersMS
Copy link
Member Author

Marking as codegen for now, though the issues may lie elsewhere.

@AndyAyersMS
Copy link
Member Author

For System.Collections.ContainFalse<String>.List, I get these results running locally, where the commits fall in the three ranges below from the lab results:

BenchmarkDotNet=v0.12.1.1528-nightly, OS=Windows 10.0.19043
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100-preview.5.21227.8
  [Host]     : .NET 6.0.0 (6.0.21.22701), X64 RyuJIT
  Job-RIQHCA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
Commit Method Size Mean Error StdDev Median Min Max
6aca7af List 512 963.4 us 10.30 us 9.64 us 965.6 us 944.4 us 975.2 us
e3db83f List 512 970.1 us 9.75 us 8.64 us 968.7 us 958.7 us 986.9 us
60e6244 List 512 966.7 us 8.50 us 7.53 us 965.8 us 954.3 us 980.9 us

newplot (23)

@AndyAyersMS
Copy link
Member Author

For Span.IndexerBench.KnownSizeArray, no local repro either.

Commit Method length Mean Error StdDev Median Min Max
6aca7af KnownSizeArray 1024 478.5 ns 4.95 ns 4.39 ns 478.9 ns 467.4 ns 485.6 ns
e3db83f KnownSizeArray 1024 478.9 ns 5.07 ns 4.75 ns 479.8 ns 470.9 ns 485.4 ns
60e6244 KnownSizeArray 1024 483.6 ns 4.12 ns 3.85 ns 484.2 ns 476.8 ns 490.2 ns

newplot (24)

@AndyAyersMS
Copy link
Member Author

@jakobbotsch has developed some .mibc comparsion tech and overall the managed profile data produced by the optimization repo looks fairly stable from day to day.

@AndyAyersMS
Copy link
Member Author

Given the relative stability of the PGO collection I'm going to close this for now.

The perf fluctuations coming from PGO updates are proving tough to pin down, but the root cause doesn't appear to be managed PGO instability.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 7, 2021
@AndyAyersMS
Copy link
Member Author

I'm going to re-open this. We continue to see a steady rate of perf fluctuations from PGO. These have become more obvious during the .NET 6 endgame because we still see daily PGO updates in main and there's not much else going on.

A few possible areas for deeper analysis:

As far as tolerating the inherent instability, @tannergooding has suggested we might blend PGO data collected on successive days as a way of stabilizing updates; this seems like it has promise.

@AndyAyersMS AndyAyersMS reopened this Sep 15, 2021
@AndyAyersMS AndyAyersMS modified the milestones: 6.0.0, 7.0.0 Sep 15, 2021
@AndyAyersMS
Copy link
Member Author

As far as I know this is not causing problems, though we do see perf swings in a number of tests from PGO updates.

Going to move to future.

@AndyAyersMS AndyAyersMS modified the milestones: 7.0.0, Future Apr 25, 2022
@dotnet dotnet unlocked this conversation Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Development

No branches or pull requests

1 participant