chore(ci): require clear hyperfine regressions#9874
Conversation
|
Note Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported. |
Greptile SummaryThis PR tightens the hyperfine CI gate so that benchmark runs whose measured regression does not exceed the 10% threshold when the relative uncertainty is subtracted are treated as inconclusive rather than red, while still failing on statistically clear regressions. It also fixes a latent bug where the improvement-detection grep was hard-coded to the release version string and would have silently misfired for custom
Confidence Score: 5/5The change is purely additive CI logic that loosens a noisy benchmark gate; it cannot break builds or affect the mise binary itself. All arithmetic paths are guarded by the empty-string checks at lines 119–124 before scale_decimal is ever called, so invalid input cannot reach the base-10 expansion. The NF-relative awk field extraction is correct regardless of how many whitespace-delimited tokens the command name occupies. The improvement-detection grep uses -F (fixed string) so special characters in version strings are not misinterpreted. The ordering of the noisy branch before the new uncertainty branch is intentional and correct. No production code changes are included. No files require special attention. Important Files Changed
Reviews (1): Last reviewed commit: "chore(ci): require clear hyperfine regre..." | Re-trigger Greptile |
Summary
MISE_ALTcomparisons use the same improvement detection pathInvestigation
The hyperfine workflow became much noisier after the Namespace runner migration in #9561 on 2026-05-03 15:44 UTC.
Completed, non-cancelled/action-required runs from 2026-04-25 through 2026-05-14:
Failed-log classification over the same window:
After #9847, there were still 3 perf-gate failures in 27 completed runs. The current representative failure is not a deterministic benchmark regression: run 25871537929 failed on
mise lswith1.16 ± 0.17, so the measured range overlaps the 10% threshold. Run 25871443106 similarly failed on a 13%mise lsresult without a hyperfine outlier warning. This PR keeps real regressions failing, but treats these statistically unclear cases as inconclusive instead of red CI.Validation
sed -n '72,148p' .github/workflows/hyperfine.yml | sed 's/^ //' | sed 's/${{ steps\.versions\.outputs\.release }}/2026.5.7/g' | bash -nmise x actionlint -- actionlint .github/workflows/hyperfine.ymlgit diff --check --cachedThis PR was generated by an AI coding assistant.