Add --exclude-flaky flag to exclude flaky diagnostics from diagnostic summary tables#22
Add --exclude-flaky flag to exclude flaky diagnostics from diagnostic summary tables#22AlexWaygood wants to merge 2 commits intomainfrom
--exclude-flaky flag to exclude flaky diagnostics from diagnostic summary tables#22Conversation
…table Adds `--exclude-flaky/--no-exclude-flaky` to `generate-diff-statistics`. When enabled, flaky diagnostic diffs (added/removed/changed flaky locations) are excluded from the per-lint and per-project summary tables. Stable diagnostic changes from projects that also have flaky data are still counted. Closes #11 (partially — rerun skipping and short-circuiting to follow). https://claude.ai/code/session_01M5AdQL4a1fFa4cngcDsYrM
The notice now also triggers when --exclude-flaky drops flaky diffs from the summary table (not only when the raw diff omits flaky projects), and the wording says "This PR summary" instead of "Raw diff output". https://claude.ai/code/session_01M5AdQL4a1fFa4cngcDsYrM
|
Did you see that we already exclude flaky diagnostics from the inline diff? Does this feature here do the same for the HTML report? Should the inline diff feature also be gated on this flag? Or should we just make this the default, if we agree that it's the desired behavior? |
Yes. But I find it odd that in astral-sh/ruff#24109 (comment), for example, the summary table still says that 40
Making it the default for the PR comment probably makes sense, yeah. I suppose we probably want these flaky diagnostics included in the summary tables in the HTML report even with this option applied... so I guess it makes sense to remove the flag, and make sure the summary table in the HTML report is given different handling to the summary table in the PR comment? |
That's why I added this hint 😄 "Changes in flaky projects detected. Raw diff output excludes flaky projects; see the HTML report for details."
Yeah, not sure. I also opened a PR this morning that remove a bunch of diagnostics from prefect, and those were actual changes, even though the project is flaky. Since we only have flakiness information on the level of projects (?), I guess it's still valuable to include diagnostics from flaky projects somewhere in the PR comment summary. But I don't care too much either way. |
Right, but that only creates more questions for me! If you know they're all flaky (and therefore, you know I won't care about them), why are you showing them to me at all 😆
What I wanted to implement was that non-flaky diagnostics from flaky project would still be featured in the PR comment summary. And only flaky diagnostics from flaky projects would be excluded from the PR comment summary. I believe that's what the |
Summary
This PR adds an
--exclude-flakyflag that excludes flaky diagnostics from the summary table posted as comments on PRs. I don't much care if a PR added 40invalid-awaitdiagnostics if the diagnostics are on a known-flaky project and all 40invalid-awaitdiagnostics were in fact detected as being flaky by ecosystem-analyzer; it's annoying to have to click through to the HTML report to be able to see that those 40 diagnostics were in fact all flaky diagnostics on prefect.We want to know if any diagnostics are newly flaky or newly not flaky -- but Doug's experiments showed that running a flaky project 10 times (what we do in CI for ty PRs) isn't generally enough to be able to establish that. And anyway, we only run flaky detection in PR CI for known-flaky projects.
I propose that we run ecosystem-analyzer in CI on PRs with this new
--exclude-flakyflag. The weekly run of ecosystem-analyzer, where we run flaky-diagnostic detection across the whole ecosystem and run with--flaky-runs=20, can continue to be run without this flag, so that we can still have a way of empirically establishing whether any flaky diagnostics were newly added or removed.This is one of the ideas mentioned in #11