Skip to content

Possible strategy changes needed to enable OSR #2214

@AndyAyersMS

Description

@AndyAyersMS

We are working towards enabling OSR (On Stack Replacement) by default for .NET 7 for x64 and arm64. As part of this we will also modify the runtime so that quick jit for loops is enabled.

See for instance dotnet/runtime#63642.

This has performance implications for benchmarks that don't run enough iterations to reach Tier1. These are typically benchmarks that internally loop and so are currently eagerly optimized because quick jit for loops is disabled. A private benchmark run shows several hundred benchmarks impacted by this, with regressions outnumbering improvements by about 2 to 1.

[Upon further analysis the number of truly impacted benchmarks may be smaller, maybe ~100. It is hard to gauge from one-off runs as many benchmarks are noisy. But we can look at perf history in main and see that some of the "regressions" seen from the one-off OSR run are in noisy tests and the values are within the expected noise range.]

One such example is Burgers.Test3. With current strategy we end up running about 20 invocations total. The main method is initially fully optimized. When we turn on QuickJitForLoops and OSR, the main method is initially not optimized. OSR accelerates its performance, but OSR performance does not reach the same level as Tier1, and we don't run enough invocations to make it to Tier1.

While in this case the OSR version is slower, sometimes the OSR version runs faster. In general, we aspire to have the OSR perf be competitive with Tier1, but swings of +/- 20% are going to be common and cannot easily be addressed.

One way we can mitigate these effects is to always run (or selectively run, for some subset of benchmarks) at least 30 warmup iterations. For example:

default

Method Job Ver Mean
Burgers_0 Job-SYCWNE OSR 192.68 ms
Burgers_0 Job-ZXAOBL DEF 184.51 ms
Burgers_1 Job-SYCWNE OSR 224.11 ms
Burgers_1 Job-ZXAOBL DEF 155.64 ms
Burgers_2 Job-SYCWNE OSR 178.63 ms
Burgers_2 Job-ZXAOBL DEF 156.51 ms
Burgers_3 Job-SYCWNE OSR 181.05 ms
Burgers_3 Job-ZXAOBL DEF 85.63 ms

default + --warmupCount 30

Method Job Ver Mean
Burgers_0 Job-SOPVCH OSR 186.75 ms
Burgers_0 Job-TMAUKP DEF 185.61 ms
Burgers_1 Job-SOPVCH OSR 155.64 ms
Burgers_1 Job-TMAUKP DEF 157.39 ms
Burgers_2 Job-SOPVCH OSR 157.37 ms
Burgers_2 Job-TMAUKP DEF 160.09 ms
Burgers_3 Job-SOPVCH OSR 89.70 ms
Burgers_3 Job-TMAUKP DEF 85.96 ms

It is expected that if we can do this (or something equivalent) then OSR will not impact perf measurements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions