Post results from benchmark as a comment on PRs #481

vinistock · 2023-02-03T18:39:33Z

Motivation

Final part of #335

Post the results of running benchmarks as a comment on the PR.

Implementation

Push the STDOUT results of running bin/benchmark into the GitHub actions output variable
Use github-script to take that output and post it as a comment

bitwise-aiden · 2023-02-06T14:54:22Z

Looking at the comment this action posted above, I'm a little worried that we'll start caching the fastest runs, but that we unnecessarily fail later runs that run normally. The 15% might not be enough.

textDocument/hover faster by 58.27626493431508 %

vinistock · 2023-02-06T15:15:41Z

Yes, I have the same concern. It seems that there's a lot of variation between each run. I'm not sure how we can achieve the best results, my only idea would be to switch the 15% for something like "faster or slower than 2 standard deviations".

What do you think? Any other ideas?

andyw8 · 2023-02-06T15:20:51Z

I see in bin/benchmark we're using Benchmark.realtime, but wouldn't Benchmark.measure be more accurate since it provides the CPU time?

vinistock · 2023-02-06T15:52:40Z

Measure returns it split by system and user CPU time. Would the right measure be the sum of both? Or just user?

andyw8 · 2023-02-06T21:15:19Z

I think it should be both since changes could potentially affect either.

vinistock · 2023-02-15T16:14:28Z

I've made some changes to try to make the benchmark more stable. It now

Considers any different smaller than a standard deviation as unchanged
Uses only user cpu time (time spent in Ruby)
Bumped the number of iterations
Did some preloading of things that could impact results
Changed the position and added some source:// comments, so that DocumentLink and Hover have actual results to process

github-actions · 2023-02-15T16:56:22Z

Benchmark results in seconds (slowest at top)

          textDocument/formatting average: 0.103217 std_dev: 0.027215
          textDocument/codeAction average: 0.038137 std_dev: 0.001258
          textDocument/diagnostic average: 0.037928 std_dev: 0.001712
      textDocument/selectionRange average: 0.003677 std_dev: 0.000796
        textDocument/foldingRange average: 0.001611 std_dev: 6.8e-05
   textDocument/documentHighlight average: 0.001526 std_dev: 0.000131
 textDocument/semanticTokens/full average: 0.000817 std_dev: 0.000176
textDocument/semanticTokens/range average: 0.000623 std_dev: 3.1e-05
               textDocument/hover average: 0.000598 std_dev: 9.0e-05
        textDocument/documentLink average: 0.00053 std_dev: 3.4e-05
           textDocument/inlayHint average: 0.000412 std_dev: 3.7e-05
      textDocument/documentSymbol average: 0.000255 std_dev: 5.4e-05
    textDocument/onTypeFormatting average: 0.000111 std_dev: 8.8e-05


================================================================================
Comparison with main branch:

 textDocument/semanticTokens/full faster by 60.47 %
textDocument/semanticTokens/range faster by 45.72 %
      textDocument/documentSymbol faster by 35.548 %
        textDocument/foldingRange faster by 48.846 %
          textDocument/formatting unchanged
          textDocument/diagnostic faster by 10.681 %
        textDocument/documentLink slower by 15.052 %
           textDocument/inlayHint faster by 11.703 %
      textDocument/selectionRange faster by 40.368 %
   textDocument/documentHighlight faster by 41.485 %
               textDocument/hover faster by 52.622 %
          textDocument/codeAction faster by 11.115 %
    textDocument/onTypeFormatting unchanged


At least one benchmark is slower than the main branch.

vinistock · 2023-02-15T16:59:07Z

@dirceu gave a great suggestion to run both the main and the branch benchmark on the same CI run to avoid some variance. This also allows us to get rid of using cache.

Currently, I believe the benchmark will always fail because I added some extra things to the fixture, but once this is merged it should be considerably more stable.

vinistock requested a review from a team as a code owner February 3, 2023 18:39

vinistock self-assigned this Feb 3, 2023

vinistock force-pushed the vs/post_benchmark_results_as_comment branch 3 times, most recently from c794298 to 1c355c1 Compare February 3, 2023 18:53