Investigate stability of PGO updates #52610

AndyAyersMS · 2021-05-11T18:12:05Z

We are seeing fairly frequent microbenchmark performance shifts with the managed PGO updates. We need to root cause this and figure out how to address them.

Possible causes:

non-deterministic training scenarios
non-deterministic collection behavior (even for deterministic scenarios)
issues in exfiltrating the profile data into the trace records
issues in converting the trace records to the .mibc format
issues merging multiple runs together into a consolidated .mibc file
issues incorporating the .mibc profile data into assemblies (see eg ...)
schema version skew leading to the jit dropping profile data
other issues causing the jit to drop profile data
jit optimizations overly sensitive to small changes in profile data

For example, the last few fluctuations here are correlated with PGO updates:

See also

First step is perhaps to create a .mibc comparison tool (or mode in in the pgo tool) to try and see how much variation we're seeing from one update to the next. Given two .mibc it would first match up methods and report how many methods have PGO data in one collection but not the other; then for cases where both methods have PGO, it could detect when the schemas have changed, and where the schemas agree, it could run a similarity analysis on the count and class profile data to see if what was measured seems to have changed in any significant way.

category:cq
theme:profile-feedback

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-05-11T18:12:08Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

AndyAyersMS · 2021-05-11T18:14:05Z

Marking as codegen for now, though the issues may lie elsewhere.

AndyAyersMS · 2021-05-11T22:00:31Z

For System.Collections.ContainFalse<String>.List, I get these results running locally, where the commits fall in the three ranges below from the lab results:

BenchmarkDotNet=v0.12.1.1528-nightly, OS=Windows 10.0.19043
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100-preview.5.21227.8
  [Host]     : .NET 6.0.0 (6.0.21.22701), X64 RyuJIT
  Job-RIQHCA : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

Commit	Method	Size	Mean	Error	StdDev	Median	Min	Max
`6aca7af`	List	512	963.4 us	10.30 us	9.64 us	965.6 us	944.4 us	975.2 us
`e3db83f`	List	512	970.1 us	9.75 us	8.64 us	968.7 us	958.7 us	986.9 us
`60e6244`	List	512	966.7 us	8.50 us	7.53 us	965.8 us	954.3 us	980.9 us

AndyAyersMS · 2021-05-12T01:30:29Z

For Span.IndexerBench.KnownSizeArray, no local repro either.

Commit	Method	length	Mean	Error	StdDev	Median	Min	Max
`6aca7af`	KnownSizeArray	1024	478.5 ns	4.95 ns	4.39 ns	478.9 ns	467.4 ns	485.6 ns
`e3db83f`	KnownSizeArray	1024	478.9 ns	5.07 ns	4.75 ns	479.8 ns	470.9 ns	485.4 ns
`60e6244`	KnownSizeArray	1024	483.6 ns	4.12 ns	3.85 ns	484.2 ns	476.8 ns	490.2 ns

AndyAyersMS · 2021-06-01T20:26:13Z

@jakobbotsch has developed some .mibc comparsion tech and overall the managed profile data produced by the optimization repo looks fairly stable from day to day.

AndyAyersMS · 2021-06-07T20:51:45Z

Given the relative stability of the PGO collection I'm going to close this for now.

The perf fluctuations coming from PGO updates are proving tough to pin down, but the root cause doesn't appear to be managed PGO instability.

AndyAyersMS · 2021-09-15T15:47:09Z

I'm going to re-open this. We continue to see a steady rate of perf fluctuations from PGO. These have become more obvious during the .NET 6 endgame because we still see daily PGO updates in main and there's not much else going on.

A few possible areas for deeper analysis:

PGO collection is inherently unstable, since we are measuring low-level details of non-deterministic applications. So some basic level of instability needs to be tolerated.
PGO collection relies on randomization (for class profiles)
PGO collection spans some 40 odd processes and the resulting merge algorithm may need some adjusting (see eg PGO: class profile details we need to get right #48549 (comment) )
PGO coverage in those processes needs to be improved in some areas (eg locales, [Perf] Regressions in System.Text.Perf_Utf8Encoding for Greek and Cyrillic #52313)
The jit will likely make important choices (eg block ordering) based on relatively thin evidence ([Perf] Regressions in System.Collections.TryGetValueFalse<String, String> #51258).
The jit is careless with updating and maintaining profile data
The jit doesn't handle mixtures of IR with PGO and without PGO well (say from inlining). See eg [Perf] Regressions in IndexOf #54165.
The jit likely has poor heuristics in many areas, so better information in does not lead to better code out

As far as tolerating the inherent instability, @tannergooding has suggested we might blend PGO data collected on successive days as a way of stabilizing updates; this seems like it has promise.

AndyAyersMS · 2022-04-25T20:55:13Z

As far as I know this is not causing problems, though we do see perf swings in a number of tests from PGO updates.

Going to move to future.

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label May 11, 2021

AndyAyersMS self-assigned this May 11, 2021

AndyAyersMS added this to the 6.0.0 milestone May 11, 2021

AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels May 11, 2021

AndyAyersMS closed this as completed Jun 7, 2021

ghost locked as resolved and limited conversation to collaborators Jul 7, 2021

AndyAyersMS reopened this Sep 15, 2021

AndyAyersMS modified the milestones: 6.0.0, 7.0.0 Sep 15, 2021

AndyAyersMS modified the milestones: 7.0.0, Future Apr 25, 2022

dotnet unlocked this conversation Nov 24, 2023

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to PGO in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate stability of PGO updates #52610

Investigate stability of PGO updates #52610

AndyAyersMS commented May 11, 2021 •

edited by BruceForstall

Loading

dotnet-issue-labeler bot commented May 11, 2021

AndyAyersMS commented May 11, 2021

AndyAyersMS commented May 11, 2021

AndyAyersMS commented May 12, 2021

AndyAyersMS commented Jun 1, 2021

AndyAyersMS commented Jun 7, 2021

AndyAyersMS commented Sep 15, 2021

AndyAyersMS commented Apr 25, 2022

Investigate stability of PGO updates #52610

Investigate stability of PGO updates #52610

Comments

AndyAyersMS commented May 11, 2021 • edited by BruceForstall Loading

dotnet-issue-labeler bot commented May 11, 2021

AndyAyersMS commented May 11, 2021

AndyAyersMS commented May 11, 2021

AndyAyersMS commented May 12, 2021

AndyAyersMS commented Jun 1, 2021

AndyAyersMS commented Jun 7, 2021

AndyAyersMS commented Sep 15, 2021

AndyAyersMS commented Apr 25, 2022

AndyAyersMS commented May 11, 2021 •

edited by BruceForstall

Loading