-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGO: Add new tiers #70941
PGO: Add new tiers #70941
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
It sounds better to me, as we never delete the call counting stubs, so this would leave less garbage. |
When I was doing tiered compilation originally one of the benchmarks I found helpful was MusicStore and I had the app measure its own latency for requests 0-500, 501-1000, 1000-1500, and so on. This helped me get an idea how quickly an app was able to converge to the steady state behavior. Completely up to you if a similar analysis would be useful now. One hypothesis I'd have for the worse TE numbers is that the benchmark might be short enough that it is capturing a substantial amount of pre-steady-state behavior. |
Thanks! When I will be patching the call-counting stub we might consider resetting the call-counting-cell to some smaller number (e.g. 10) |
Just realized that I can also introduce a new tier for non-r2r cases:
This solves a different problem that we have now - instrumentation is quite heavy (both in terms of TP and perf). However, as Andy noted, we need to be careful around OSR. I have a demo locally, for now I am allocating a new callcounting stub every time because it's simpler |
This comment was marked as outdated.
This comment was marked as outdated.
69f1cf6
to
ef66369
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
We can't currently leave Tier1-OSR and get back to Tier0 (at least mid-method; if the method is called every so often we could switch at a call boundary. But it wouldn't help say What we could do instead is go from Tier0 (uninstrumented) to Tier0-OSR (instrumented) and then to Tier1-OSR (instrumented). This would give some PGO data, but we might not see the early parts of the method execute with instrumentation. |
This comment was marked as outdated.
This comment was marked as outdated.
Sort of? I can post what I think of as the right flow in a bit. |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment, otherwise the tiering stuff LGTM, thanks!
/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 2 pipeline(s). |
RPS during startup for the BingSNR, X - RPS, Y - time from start, seconds Observations:
The measurements are quite stable, I did a lot of runs locally (made a script) |
The problem with the queue of methods pending the call counting installation (not even promotion to tier1, just to start counting calls) visualized (Bing SNR again): X axis - time in seconds after start. It means that a lot of methods were stuck in tier0 and had no chance to get a call-counting stub to start counting - every 100ms a new compilation occurs and it delays call-counting stub installation till we have a window large enough for it. |
Anyway, I'd like to work on improvements for that queue in a separate PR (and will adjust the OSR limit), e.g. to introduce a separate "low priority" queue for methods coming from R2R since it's fast as is @AndyAyersMS can you review/approve the jit part? It seems to be passing outerloop PGO pipelines (both runtime and libraries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Left notes on a few things to consider.
I take it COMPlus_WritePGOData
still works as long as you also enable TieredPGO? It is occasionally quite useful.
src/coreclr/jit/fgprofile.cpp
Outdated
const bool osrMethod = opts.IsOSR(); | ||
const bool useEdgeProfiles = (JitConfig.JitEdgeProfiling() > 0) && !prejit && !tier0WithPatchpoints && !osrMethod; | ||
const bool instrOpt = opts.IsInstrumentedOptimized(); | ||
const bool useEdgeProfiles = (JitConfig.JitEdgeProfiling() > 0) && !prejit && !tier0WithPatchpoints && !instrOpt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need block profiles for full instrumented/optimized methods? Seems like edge profiles might work -- unless perhaps the think you need is the is special handling for tail calls.
if so can you add a clarifying comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we still need it for now, I'll allocate some time to investigate what is needed to enable edge-profiling after this PR
src/coreclr/jit/importercalls.cpp
Outdated
// Only schedule importation if we're not currently importing. | ||
// | ||
if (mustImportEntryBlock && (compCurBB != fgEntryBB)) | ||
if ((opts.IsInstrumentedOptimized() || opts.IsOSR()) && mustImportEntryBlock && (compCurBB != entryBb)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an OSR only issue, as full method compilation will naturally start importing with fgFirstBB
.
@@ -1288,7 +1288,7 @@ var_types Compiler::impImportCall(OPCODE opcode, | |||
// have to check for anything that might introduce a recursive tail call. | |||
// * We only instrument root method blocks in OSR methods, | |||
// | |||
if (opts.IsOSR() && !compIsForInlining()) | |||
if ((opts.IsInstrumentedOptimized() || opts.IsOSR()) && !compIsForInlining()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted below, I think this is an OSR only problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it could be
if ((opts.IsInstrumentedOptimized() || opts.IsOSR()) && !compIsForInlining()) | |
if ((opts.IsInstrumentedOptimized() && opts.IsOSR()) && !compIsForInlining()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opts.IsInstrumentedOptimized() && opts.IsOSR()
is not working, for some reason it hits an assert even in non PGO OSR mode. I should be able to revert these changes once I fix "edge profiling" for optimized code
/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 2 pipeline(s). |
Co-authored-by: Andy Ayers <[email protected]>
I'm merging the PR, we should see improvements at the PGO tab at https://aka.ms/aspnet/benchmarks in a couple of days. Just in case: this PR doesn't affect the current defaults, it only kicks in in DOTNET_TieredPGO=1 PS: I've just kicked off a new SPMI collection |
Keep a close eye on BDN results from the lab -- we may need to adjust things since some benchmarks implicitly rely on the old tiering strategy. |
Out of the oven public results from https://aka.ms/aspnet/benchmarks (14th page) 🙂 DynamicPGO now have almost the same RPS as FullPGO, slightly less because of less accurate profile in optimized code (after R2R) - as expected. Improvements in the working set are unrelated - #76985. I checked that this PR doesn't regress working set (due to more compilations) or at least it's around noise. |
This PR implements @jkotas's idea in #70410 (comment) when
DOTNET_TieredPGO
is enabled (it's off by default and will be so in .NET 7.0)Design