Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with "run one iteration for long operations", it's flaky and considers many nano-operations as "long", leading to one op per cycle #1338

Closed
abelbraaksma opened this issue Dec 20, 2019 · 7 comments · Fixed by #1573

Comments

@abelbraaksma
Copy link

abelbraaksma commented Dec 20, 2019

This PR: #760, introduced a bug that proved to be notoriously hard to diagnose.

The issue at hand seems to be that the metric used to determine whether something should run once or go through the pilot phase, is flaky. In fact, it appears that the JIT time is taken into account.

In short: if one invocation is ~7ns (this example) and the IterationTime is set to 120ms, the change in #760 assumes that it can skip the pilot phase and sets invocation and unroll to 1.

I think JIT'ing is skewing the metric, because once you change sprintf "test %A" "test" to sprintf "test %s" "test", the problem goes away (and the difference is the JIT time). It also goes away when giving foo a body with sprintf "test %A" "test". I don't know why.

Proposal

@adamsitnik, I can see a few ways forward:

  1. Add a line in the output why the pilot phase is skipped. I spend two days searching for a cause, it is very puzzling if one irrelevant change in your own code leads to attributes being ignored

  2. Improve the metric. Hopefully the two examples here and in Trivial methods can cause BDN to run only 1 iteration, skip pilot phase, and ignore most attributes #1337 will help analyzing the cause

  3. Perhaps simplest of all: add an attribute, like ForcePilotPhase to override behavior of don't execute long operations more than once per iteration #760, or allow UnrollFactor as an attribute, which would also override it, if I understand the PR correctly.

Minimal working example

Here's an example:

namespace BenchMarkBug

open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open System.Globalization


module Test =
    let getFunctionToTestOneTimeOverhead =
        let foo = sprintf "test %A" "test"
        fun() -> true

// you may need a slightly lower threshold to see the behavior mentioned
[<IterationTime(120.)>]
[<SimpleJob()>]
type BenchmarkPatterns() =
    let foo =
        ()

    [<Benchmark>]
    member _.basicTest() = 
        Test.getFunctionToTestOneTimeOverhead ()
        |> ignore


module Main =

    [<EntryPoint>]
    let main _ =
        BenchmarkRunner.Run<BenchmarkPatterns>()
        |> ignore

        0 // return success

Output

The output shows that the pilot stage is skipped. However, the iteration time was set to 120ms, and there isn't any metric in the output below that tops that.

// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.8.3928.0
// GC=Concurrent Workstation
// Job: Job-KABBQW(IterationTime=120.0000 ms)
OverheadJitting  1: 1 op, 234052.26 ns, 234.0523 us/op
WorkloadJitting  1: 1 op, 86470300.92 ns, 86.4703 ms/op
WorkloadWarmup   1: 1 op, 4311.49 ns, 4.3115 us/op
WorkloadWarmup   2: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup   3: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup   4: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup   5: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup   6: 1 op, 615.93 ns, 615.9270 ns/op
// BeforeActualRun
WorkloadActual   1: 1 op, 615.93 ns, 615.9270 ns/op
WorkloadActual   2: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadActual   3: 1 op, 615.93 ns, 615.9270 ns/op
WorkloadActual   4: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadActual   5: 1 op, 307.96 ns, 307.9635 ns/op

Once I change the IterationTime to 150ms, I get the expected output:

// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.8.3928.0
// GC=Concurrent Workstation
// Job: Job-OTJYUL(IterationTime=150.0000 ms)
OverheadJitting  1: 1 op, 226969.10 ns, 226.9691 us/op
WorkloadJitting  1: 1 op, 89779676.75 ns, 89.7797 ms/op
WorkloadPilot    1: 4 op, 7699.09 ns, 1.9248 us/op
WorkloadPilot    2: 77931 op, 542323.73 ns, 6.9590 ns/op
WorkloadPilot    3: 21554745 op, 149653325.48 ns, 6.9429 ns/op
WorkloadPilot    4: 21604677 op, 149687201.47 ns, 6.9285 ns/op
WorkloadPilot    5: 21649824 op, 149388476.87 ns, 6.9002 ns/op
WorkloadPilot    6: 21738448 op, 152644575.01 ns, 7.0219 ns/op
WorkloadPilot    7: 21361828 op, 148062078.05 ns, 6.9312 ns/op
WorkloadPilot    8: 21641424 op, 149757725.11 ns, 6.9200 ns/op
WorkloadPilot    9: 21676435 op, 151683112.94 ns, 6.9976 ns/op
WorkloadPilot   10: 21435908 op, 147993094.23 ns, 6.9040 ns/op
WorkloadPilot   11: 21726596 op, 150739820.73 ns, 6.9380 ns/op
WorkloadWarmup   1: 21726596 op, 152429924.44 ns, 7.0158 ns/op
WorkloadWarmup   2: 21726596 op, 151122003.44 ns, 6.9556 ns/op
WorkloadWarmup   3: 21726596 op, 151391163.54 ns, 6.9680 ns/op
WorkloadWarmup   4: 21726596 op, 150467580.99 ns, 6.9255 ns/op
WorkloadWarmup   5: 21726596 op, 149482405.74 ns, 6.8802 ns/op
WorkloadWarmup   6: 21726596 op, 150461729.68 ns, 6.9252 ns/op
WorkloadWarmup   7: 21726596 op, 150266172.86 ns, 6.9162 ns/op
// BeforeActualRun
WorkloadActual   1: 21726596 op, 150254162.28 ns, 6.9157 ns/op
WorkloadActual   2: 21726596 op, 152880783.02 ns, 7.0366 ns/op
WorkloadActual   3: 21726596 op, 149523672.85 ns, 6.8821 ns/op
WorkloadActual   4: 21726596 op, 149548617.89 ns, 6.8832 ns/op
WorkloadActual   5: 21726596 op, 150369032.67 ns, 6.9210 ns/op
@abelbraaksma abelbraaksma changed the title Feature "run one iteration for long operations" is buggy and considers many nano-operations as "long" Issue with "run one iteration for long operations", it's flaky and considers many nano-operations as "long", leading to one op per cycle Dec 20, 2019
@abelbraaksma
Copy link
Author

abelbraaksma commented Jul 24, 2020

@adamsitnik, this issue keeps popping up, could we have a look at this? The calculation for skipping the iterations seems (very) off.

Another case with just C# code was here: #1505, see my comment there.

@abelbraaksma
Copy link
Author

abelbraaksma commented Aug 5, 2020

@adamsitnik, here's another case, this time encountered by @cartermp, see this comment for detailed output: dotnet/fsharp#9218 (comment)

Here, settings are default, and should lead to some 70 ops per run, but this doesn't happen.

@cartermp
Copy link

cartermp commented Aug 5, 2020

Thanks for filing and noticing @abelbraaksma - in my benchmark I couldn't find a good IterationTime that would actually cause the benchmark to run more than one op at a time.

@Happypig375
Copy link
Member

@adamsitnik Hello? Not even a response?

@Happypig375
Copy link
Member

A weird workaround is to place a long-running benchmark (>=100ms) as the first benchmark and place microbenchmarks after.

adamsitnik added a commit that referenced this issue Oct 24, 2020
… magic number based on observations of the reported bug)

we call it one more time to see if Jitting itself has not dominated the first invocation
if it did, it shoud NOT be a single invocation engine (fixes #837, fixes #1337 and fixes #1338)
@adamsitnik
Copy link
Member

Please excuse me for the delay. I've sent a fix: #1573

@abelbraaksma
Copy link
Author

@adamsitnik, great! That's awesome, I'll give it a try after the weekend 👍.

AndreyAkinshin pushed a commit that referenced this issue Oct 26, 2020
…n lasts longer than IterationTime (#1573)

* add the test

* if the Jitting took more than IterationTime but still less than 1s (a magic number based on observations of the reported bug)

we call it one more time to see if Jitting itself has not dominated the first invocation
if it did, it shoud NOT be a single invocation engine (fixes #837, fixes #1337 and fixes #1338)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants