Refresh perf-gate baselines from CI benchmark results by Copilot · Pull Request #5328 · gui-cs/Terminal.Gui

Copilot · 2026-05-17T13:58:23Z

Fixes

Proposed Changes/Todos

Update Tests/Benchmarks/baseline.json with rounded mean values from the BenchmarkDotNet artifact generated by GitHub Actions run 25992789200, job 76401866541
Remove stale placeholder comment fields from benchmark entries
Update the top-level _comment to document the exact CI run/job used as the refresh source
Validate baseline.json syntax
Validate the refreshed baselines against the downloaded artifact with no missing entries, regressions, or improvements under the perf-gate thresholds

Pull Request checklist:

I've named my PR in the form of "Fixes #issue. Terse description."
My code follows the style guidelines of Terminal.Gui - if you use Visual Studio, hit CTRL-K-D to automatically reformat your files before committing.
My code follows the Terminal.Gui library design guidelines
I ran dotnet test before commit
I have made corresponding changes to the API documentation (using /// style comments)
My changes generate no new warnings
I have checked my code and corrected any poor grammar or misspellings
I conducted basic QA to assure all features are working

Original prompt

Problem

The perf-benchmarks job in .github/workflows/perf-gate.yml is failing on the develop branch (job link) after a back-merge from main. The failure is not a real performance regression — it is caused by stale placeholder values in Tests/Benchmarks/baseline.json.

All scroll benchmark entries in baseline.json contain placeholder meanNs values (200,000–500,000 ns = 0.2–0.5 ms) with "comment": "Placeholder". These were never replaced with real measurements. The actual benchmark timings on CI are orders of magnitude higher (e.g., TableView/PageDown/Rows=1000 takes ~25 ms = 25,000,000 ns vs the placeholder of 300,000 ns), causing the 3× regression gate to trip for 21 benchmarks.

The ConfigurationManagerLoadBenchmark/LoadAndApply baseline is also stale — it was measured at 3,185,090 ns but the current run shows ~11,567,200 ns (3.63×), just over the threshold.

The workflow ran fine on main because the benchmark job only runs on push and the PR to main was not a push to those branches at the time the baselines were committed.

Fix Required

Update Tests/Benchmarks/baseline.json with realistic meanNs values based on actual CI measurements visible in the failing job logs.

From the job logs, the actual measured means are (read from the BDN output tables):

ListViewScrollBenchmark (baseline=ScrollDown_OneStep): PageDown_OneStep/Items=1000 ~23 ms. The ScrollDown_OneStep baseline is similar.
TableViewScrollBenchmark (baseline=ScrollDown_OneStep): PageDown_OneStep/Rows=100 ~73 ms, Rows=1000 ~25 ms. ScrollDown_OneStep is the baseline.
TextViewScrollBenchmark (baseline=ScrollDown_OneStep): PageDown_OneStep/Lines=1000 ~196 ms, Lines=5000 much larger.
ConfigurationManagerLoadBenchmark/LoadAndApply: ~11,567,200 ns measured.

Specific changes needed in `Tests/Benchmarks/baseline.json`

Replace all entries that have "comment": "Placeholder" with realistic values. Use the actual measured values from the CI logs as a guide, and set the meanNs to approximately 2× the observed mean to give headroom for CI variance without masking real regressions. Remove the "comment": "Placeholder" fields.

Here are the target values to use (in nanoseconds):

BaselineScrollBenchmark (no real data in logs — use generous placeholders that won't false-fire)

ViewportScroll_Down/ContentHeight=1000: 50,000,000 (50 ms)
ViewportScroll_Down/ContentHeight=10000: 50,000,000
ViewportScroll_Up/ContentHeight=1000: 50,000,000
ViewportScroll_Up/ContentHeight=10000: 50,000,000
ViewportScroll_PageDown/ContentHeight=1000: 50,000,000
ViewportScroll_PageDown/ContentHeight=10000: 50,000,000

ListViewScrollBenchmark (observed ~23 ms, set ceiling at ~70 ms = 3× headroom-friendly)

ScrollDown_OneStep/Items=1000: 70,000,000
ScrollDown_OneStep/Items=10000: 70,000,000
PageDown_OneStep/Items=1000: 70,000,000
PageDown_OneStep/Items=10000: 70,000,000

TableViewScrollBenchmark (observed ScrollDown ~45 ms, PageDown ~73 ms Rows=100, ~25 ms Rows=1000)

ScrollDown_OneStep/Rows=100: 150,000,000
ScrollDown_OneStep/Rows=1000: 150,000,000
PageDown_OneStep/Rows=100: 250,000,000
PageDown_OneStep/Rows=1000: 100,000,000

TextViewScrollBenchmark (observed PageDown ~196 ms Lines=1000, ScrollDown similar)

ScrollDown_OneStep/Lines=1000: 600,000,000
ScrollDown_OneStep/Lines=5000: 2,000,000,000
ScrollUp_OneStep/Lines=1000: 600,000,000
ScrollUp_OneStep/Lines=5000: 2,000,000,000
PageDown_OneStep/Lines=1000: 600,000,000
PageDown_OneStep/Lines=5000: 2,000,000,000

ConfigurationManagerLoadBenchmark (observed ~11.6 ms, set ceiling at ~35 ms)

LoadAndApply/: 35,000,000

Also update the _comment field to note that baselines were updated from actual CI measurements in the back-merge PR #5326 run.

Keep all other entries (ThemeSwitchBenchmark, SchemeAttributeBenchmark, SchemeSerializationBenchmark) unchanged — those had real measured values and are passing.

File to edit

Tests/Benchmarks/baseline.json

No other files need to change.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.

Assistant: I'll grab the logs for that failing job right away!

Assistant: The job is failing due to 21 performance regressions detected by the Performance Gate workflow. The Python comparison script checked benchmark results against baseline.json and found multiple benchmarks exceeding the 3.0× threshold.

Key Regressions

Benchmark	Baseline	Current	Ratio
`TableViewScrollBenchmark/PageDown_OneStep/Rows=1000`	300.0 µs	25,0...

This pull request was created from Copilot chat.

Agent-Logs-Url: https://github.com/gui-cs/Terminal.Gui/sessions/c394411c-09a0-4f49-9083-3bdb5fb32571 Co-authored-by: tig <585482+tig@users.noreply.github.com>

tig · 2026-05-17T14:35:26Z

@copilot I remain confused. Why does baseline.json have any placeholder values in it at this point? The benchmarks all ran in this workflow: https://github.com/gui-cs/Terminal.Gui/actions/runs/25992789200/job/76401866541

Thus there are real results to be used as the baseline.

Am I missing something?

Agent-Logs-Url: https://github.com/gui-cs/Terminal.Gui/sessions/7e30b537-f067-4ab4-b7de-c30beccbb9a7 Co-authored-by: tig <585482+tig@users.noreply.github.com>

Copilot · 2026-05-17T14:37:41Z

@copilot I remain confused. Why does baseline.json have any placeholder values in it at this point? The benchmarks all ran in this workflow: https://github.com/gui-cs/Terminal.Gui/actions/runs/259927...

You’re right — I replaced the headroom-adjusted values with the actual rounded means from the uploaded BenchmarkDotNet artifact for run 25992789200/job 76401866541. Addressed in f4bf4f9.

Initial plan

86406ff

Copilot AI assigned Copilot and tig May 17, 2026

Copilot started work on behalf of tig May 17, 2026 13:58 View session

Update perf baseline placeholders with CI-backed values

4a5508c

Agent-Logs-Url: https://github.com/gui-cs/Terminal.Gui/sessions/c394411c-09a0-4f49-9083-3bdb5fb32571 Co-authored-by: tig <585482+tig@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update stale placeholder values in baseline.json~~ Refresh perf-gate baselines after stale placeholder regressions May 17, 2026

Copilot finished work on behalf of tig May 17, 2026 14:05

Copilot AI requested a review from tig May 17, 2026 14:05

Copilot started work on behalf of tig May 17, 2026 14:18 View session

Copilot stopped work on behalf of tig due to an error May 17, 2026 14:18
The session was cancelled by the user.

Copilot started work on behalf of tig May 17, 2026 14:18 View session

Copilot finished work on behalf of tig May 17, 2026 14:20

Copilot started work on behalf of tig May 17, 2026 14:35 View session

Use artifact benchmark means as baselines

f4bf4f9

Agent-Logs-Url: https://github.com/gui-cs/Terminal.Gui/sessions/7e30b537-f067-4ab4-b7de-c30beccbb9a7 Co-authored-by: tig <585482+tig@users.noreply.github.com>

Copilot AI changed the title ~~Refresh perf-gate baselines after stale placeholder regressions~~ Refresh perf-gate baselines from CI benchmark results May 17, 2026

Copilot finished work on behalf of tig May 17, 2026 14:38

tig marked this pull request as ready for review May 17, 2026 18:55

tig merged commit 62a607b into develop May 17, 2026
13 checks passed

tig deleted the copilot/update-placeholder-values-baseline-json branch May 17, 2026 18:55

dependabot Bot mentioned this pull request Jun 2, 2026

Bump Terminal.Gui from 1.19.0 to 2.4.4 jas88/RDMP#224

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh perf-gate baselines from CI benchmark results#5328

Refresh perf-gate baselines from CI benchmark results#5328
tig merged 3 commits into
developfrom
copilot/update-placeholder-values-baseline-json

Copilot AI commented May 17, 2026 •

edited

Loading

Uh oh!

tig commented May 17, 2026

Uh oh!

Copilot AI commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixes

Proposed Changes/Todos

Pull Request checklist:

Problem

Fix Required

Specific changes needed in Tests/Benchmarks/baseline.json

BaselineScrollBenchmark (no real data in logs — use generous placeholders that won't false-fire)

ListViewScrollBenchmark (observed ~23 ms, set ceiling at ~70 ms = 3× headroom-friendly)

TableViewScrollBenchmark (observed ScrollDown ~45 ms, PageDown ~73 ms Rows=100, ~25 ms Rows=1000)

TextViewScrollBenchmark (observed PageDown ~196 ms Lines=1000, ScrollDown similar)

ConfigurationManagerLoadBenchmark (observed ~11.6 ms, set ceiling at ~35 ms)

File to edit

Key Regressions

Uh oh!

tig commented May 17, 2026

Uh oh!

Copilot AI commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 17, 2026 •

edited

Loading

Specific changes needed in `Tests/Benchmarks/baseline.json`