Report coverage stability in the dashboard #5

Zac-HD · 2022-10-26T08:32:44Z

We should also report coverage stability fraction, including a rating stable (100%), unstable (85%--100%), or serious problem (<85%"); and explain the difference between stability (=coverage) and flakiness (=outcome). Stability is mostly an efficiency thing; flakiness means your test is broken.

This mostly requires measuring both of these on the backend and then plumbing the data around, it's not hugely involved.

tybug · 2025-01-07T20:16:13Z

A silly question: what is coverage stability? I'd guess some jaccard similarity-derived metric of coverage between all runs of the same input, such that "100%" is deterministic and anything else is not?

Zac-HD · 2025-01-08T05:58:11Z

Fraction of (repeated) inputs for which we observe identical coverage each time; it's pretty easy to measure. Concretely I'd replay from our seed pool at a diminishing rate (say 1% of executions, halve the rate each time through the pool, over-schedule new pool entries to catch up, replay entries twice occasionally as a cache-hit heuristic), so we get some decent confidence fairly promptly but with low overhead over time.

We're really just reporting an observation here rather than estimating some latent property, partly because it's unclear what that would be. e.g. conceptually we might have different flake-rates per input, may or may not care about that, etc. A particularly common case is that we observe different coverage only the first time we play some input; it's ambiguous whether we actually want such a seed in our pool since mutants are less likely to help.

Zac-HD mentioned this issue Nov 16, 2024

Migrate dashboard to polling the db #41

Merged

Zac-HD mentioned this issue Jan 6, 2025

Implement second-order jack-knife lower bound on number of branches #44

Open

Zac-HD mentioned this issue Mar 9, 2025

Integrate Tyche with the dashboard #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report coverage stability in the dashboard #5

Report coverage stability in the dashboard #5

Zac-HD commented Oct 26, 2022

tybug commented Jan 7, 2025

Zac-HD commented Jan 8, 2025

Report coverage stability in the dashboard #5

Report coverage stability in the dashboard #5

Comments

Zac-HD commented Oct 26, 2022

tybug commented Jan 7, 2025

Zac-HD commented Jan 8, 2025